【问题标题】:Join multiple tables with same structure but different data连接多个结构相同但数据不同的表
【发布时间】:2017-11-15 01:04:25
【问题描述】:

我正在尝试在 8 个表之间建立连接,因为每个表都有超过 500,000 个条目,所以速度很慢。我想知道,你有什么最好的方法来加入这些表吗?

所有表都具有这种结构:

数据温度:

+----+----------+-----+-----------+----------+
| ID_geo | NAME     | Value | Date            |
+--------+----------+-------+-----------------+
|  10005 | Madrid   |  32   | 2017-06-12 08:00|
|  10005 | Madrid   |  25   | 2017-06-12 09:00|
|  12701 | Paris    |  23   | 2017-06-12 08:00|
|  13006 | Tokyo    |  25   | 2017-06-12 11:00|
|  11132 | Sevilla  |  27   | 2017-06-12 16:00|
|  21333 | London   |  22   | 2017-06-12 17:00|
+--------+----------+-------+-----------------+

data_WeatherSimbol

+----+----------+-----+-----------+----------+
| ID_geo | NAME     | Value | Date            |
+--------+----------+-------+-----------------+
|  10005 | Madrid   |  A+   | 2017-06-12 08:00|
|  10005 | Madrid   |  A    | 2017-06-12 09:00|
|  12701 | Paris    |  A-   | 2017-06-12 08:00|
|  13006 | Tokyo    |  C-   | 2017-06-12 11:00|
|  11132 | Sevilla  |  I+   | 2017-06-12 16:00|
|  21333 | London   |  D-   | 2017-06-12 17:00|
+--------+----------+-------+-----------------+

我想加入以获得这个结果:

+----+----------+-----+-----------+----------+-----------------+
| ID_geo | NAME     | Temperature | Simboles |      Date       |
+--------+----------+-------------+----------+-----------------+
|  10005 | Madrid   |      32     |    A+    | 2017-06-12 08:00|
|  10005 | Madrid   |      25     |    A     | 2017-06-12 09:00|
|  12701 | Paris    |      23     |    A-    | 2017-06-12 08:00|
|  13006 | Tokyo    |      25     |    C-    | 2017-06-12 11:00|
|  11132 | Sevilla  |      27     |    I+    | 2017-06-12 16:00|
|  21333 | London   |      22     |    D-    | 2017-06-12 17:00|
+--------+----------+-------------+----------+-----------------+

谢谢

更新真实数据提供:

执行计划: https://files.fm/u/b4besk27

这是查询:

    SELECT
    cielo.data_value AS cielo,
    lluv.data_value AS lluvia,
  temp.data_value AS temp,
  vientos.data_value AS viento,
  tmin.data_value AS tempmin,
  tmax.data_value AS tempmax, 
    cielo.data_date AS DiaPrev
FROM
    data_cielo AS cielo
INNER JOIN data_lluvia AS lluv ON cielo.data_geo = lluv.data_geo
INNER JOIN data_presion AS pres ON cielo.data_geo = pres.data_geo
INNER JOIN data_temp AS temp ON cielo.data_geo = temp.data_geo
LEFT JOIN data_tempmax AS tmax ON cielo.data_geo = tmax.data_geo
LEFT JOIN data_tempmin AS tmin ON cielo.data_geo = tmin.data_geo
INNER JOIN data_viento AS vientos ON cielo.data_geo = vientos.data_geo

WHERE
    cielo.data_date = lluv.data_date
AND pres.data_date = cielo.data_date
AND vientos.data_date = pres.data_date
AND temp.data_date = vientos.data_date
AND cielo.data_geo = 46 ORDER BY cielo.data_date;
and this is the result:

E+  0.0461028   29.6937088  S2  19.408  36.39   2017-06-13 12:00:00.000
E+  0.0461028   29.6937088  S2  21.422  36.39   2017-06-13 12:00:00.000
E+  0.0461028   29.6937088  S2  19.408  37.853  2017-06-13 12:00:00.000
E+  0.0461028   29.6937088  S2  21.422  37.853  2017-06-13 12:00:00.000
E+  0.0461028   30.7593854  S2  19.408  36.39   2017-06-13 13:00:00.000
E+  0.0461028   30.7593854  S2  21.422  36.39   2017-06-13 13:00:00.000
E+  0.0461028   30.7593854  S2  19.408  37.853  2017-06-13 13:00:00.000
E+  0.0461028   30.7593854  S2  21.422  37.853  2017-06-13 13:00:00.000
A+  0.0461028   31.6310774  SSW2    19.408  36.39   2017-06-13 14:00:00.000
A+  0.0461028   31.6310774  SSW2    21.422  36.39   2017-06-13 14:00:00.000
A+  0.0461028   31.6310774  SSW2    19.408  37.853  2017-06-13 14:00:00.000
A+  0.0461028   31.6310774  SSW2    21.422  37.853  2017-06-13 14:00:00.000
A   0.0461028   32.2647927  S2  19.408  36.39   2017-06-13 15:00:00.000
A   0.0461028   32.2647927  S2  21.422  36.39   2017-06-13 15:00:00.000
A   0.0461028   32.2647927  S2  19.408  37.853  2017-06-13 15:00:00.000

它不应该这样,我需要像我所说的那样的结果,比如温度、压力、降水、天空、......的每小时数据值......

【问题讨论】:

  • IMO,没有任何规范化的糟糕设计。
  • @PrabhatG 这是因为它从 txt 文件批量插入到 8 个表(8 个计量变量)中,我不知道他们为什么要这样设计,但这就是它的建议吗?跨度>
  • 尝试在 ID_Geo 上创建索引。这将减少查询执行时间。
  • 首先在 ID_Geo 上创建聚集索引。然后根据 ID_geo 简单地加入 2 个表。例如:Select a.ID_geo, a.NAME, a.Value as Temperature, b.Value as Simboles, a.Date from data_temprature a inner join data_WeatherSimbol b on a.ID_geo = b.ID_geo
  • @Debabrata 好主意。在视图中还是每个 8 表?

标签: sql sql-server join sql-server-2005 view


【解决方案1】:

我认为你可以加入地理和日期:

select t.*, ws.simboles
from data_temperature t join
     data_WeatherSimbol ws
     on t.ID_geo = ws.ID_geo and t.date = ws.date;

【讨论】:

  • 这个问题是它会超级慢
  • 为什么join 会“超级慢”?
  • 我猜是因为有很多连接不是更好地从这些表中查看?还是用索引集群管理它?
  • @AriaR。 . . .我不知道你的数据库结构是什么。没有理由认为JOINs 会“超级慢”。视图不会提高性能(在大多数数据库中,至少它们不会使性能变差)。
【解决方案2】:

试试这个

;With data_temprature(ID_geo,NAME,Value,[Date])
AS
(
SELECT  10005 , 'Madrid'   ,  32   , '2017-06-12 08:00' Union all
SELECT  10005 , 'Madrid'   ,  25   , '2017-06-12 09:00' Union all
SELECT  12701 , 'Paris'    ,  23   , '2017-06-12 08:00' Union all
SELECT  13006 , 'Tokyo'    ,  25   , '2017-06-12 11:00' Union all
SELECT  11132 , 'Sevilla'  ,  27   , '2017-06-12 16:00' Union all
SELECT  21333 , 'London'   ,  22   , '2017-06-12 17:00' 
)
,data_WeatherSimbol(ID_geo,NAME,Value,[Date])
AS
(
SELECT  10005 , 'Madrid'   ,  'A+'   , '2017-06-12 08:00' Union all
SELECT  10005 , 'Madrid'   ,  'A'   ,  '2017-06-12 09:00' Union all
SELECT  12701 , 'Paris'    ,  'A-'   , '2017-06-12 08:00' Union all
SELECT  13006 , 'Tokyo'    ,  'C-'   , '2017-06-12 11:00' Union all
SELECT  11132 , 'Sevilla'  ,  'I+'   , '2017-06-12 16:00' Union all
SELECT  21333 , 'London'   ,  'D-'   , '2017-06-12 17:00' 
)
SELECT ID_geo,
       NAME,
       Temperature,
       Symboles,
       [Date]  From 
(
SELECT t.ID_geo ,
        t.NAME ,
        t.Value AS Temperature,
        w.Value AS Symboles,t.[Date] ,
        ROW_NUMBER()OVER(PARTITION BY t.Value,t.[Date] ORDER BY t.[Date]) AS Rno
 FROM data_temprature t
INNER join data_WeatherSimbol w
On t.ID_geo=w.ID_geo
)Dt 
WHERE Dt.Rno=1
ORDER BY ID_geo 

【讨论】:

    【解决方案3】:

    [ID_geo][Date] 似乎都不够独特,可以加入,所以:

    1. 为所有表的两列创建索引

      create index IX_data_temprature on data_temprature ([ID_geo], [Date])

    2. 通过[ID_geo][Date]加入所有表格

    【讨论】:

      【解决方案4】:

      查询的大部分负载是由 RID 查找引起的。

      当索引不覆盖查询时使用 RID 查找(Sql 必须在表中查找值,因为它们不包含在索引中)并且索引是非聚集的。

      如果您使用覆盖索引,您的查询可能会更快,您可能没有在索引中包含值。有关包含的更多信息,请访问Microsoft docs

      如果您将非聚集索引更改为聚集索引,也可能会有所帮助。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2010-09-29
        • 1970-01-01
        • 1970-01-01
        • 2018-07-19
        • 1970-01-01
        • 2020-05-18
        相关资源
        最近更新 更多