【问题标题】:How would I make this join on this statistic?我将如何在此统计信息上加入?
【发布时间】:2012-01-23 22:57:03
【问题描述】:

首先,对问题标题感到抱歉。我不喜欢统计术语或这种加入困难,不管是什么。

我有一个查询*,通过它我基本上生成了三件事.. random_sexrandom_firstrandom_last。我现在正在尝试使用this method 加入。

 random_sex |   random_first   |   random_last    
------------+------------------+------------------
 male       | 47.7101715711225 | 24.3833348881337
 male       | 72.8463141907472 | 28.3560050522089
 female     | 72.8617294209544 | 33.3203859277759
 male       | 39.3406164890062 | 26.3352867371729
 female     | 28.6855500966031 | 65.8870893270099
 female     | 35.5960198949557 | 83.1188118207422
 male       | 11.5711074977927 |  10.544433838184
 male       | 15.6900786811765 | 18.7324617852545
 male       | 24.9860797089245 | 8.98265511383023
 female     | 80.4563122882508 |  35.594445341751
(10 rows)

基本上,人口普查数据位于这样的表格中......

    name    | freq  | cumfreq | rank | name_type 
------------+-------+---------+------+-----------
 SMITH      | 1.006 |   1.006 |    1 | LAST
 JOHNSON    |  0.81 |   1.816 |    2 | LAST
 WILLIAMS   | 0.699 |   2.515 |    3 | LAST
 JONES      | 0.621 |   3.136 |    4 | LAST
 BROWN      | 0.621 |   3.757 |    5 | LAST
 DAVIS      |  0.48 |   4.237 |    6 | LAST
 MILLER     | 0.424 |    4.66 |    7 | LAST
 WILSON     | 0.339 |       5 |    8 | LAST
 MOORE      | 0.312 |   5.312 |    9 | LAST
 TAYLOR     | 0.311 |   5.623 |   10 | LAST
 ANDERSON   | 0.311 |   5.934 |   11 | LAST
 THOMAS     | 0.311 |   6.245 |   12 | LAST
 JACKSON    |  0.31 |   6.554 |   13 | LAST
 WHITE      | 0.279 |   6.834 |   14 | LAST
 HARRIS     | 0.275 |   7.109 |   15 | LAST
 MARTIN     | 0.273 |   7.382 |   16 | LAST
 THOMPSON   | 0.269 |   7.651 |   17 | LAST
 GARCIA     | 0.254 |   7.905 |   18 | LAST
 MARTINEZ   | 0.234 |    8.14 |   19 | LAST

而且,在这种情况下..

 random_sex |   random_first   |    random_last    
 male       | 47.7101715711225 | 24.3833348881337

我希望它像这样(程序上)加入:

=# select * from census.names where cumfreq > 47.7101715711225 AND name_type = 'MALE_FIRST' order by cumfreq asc limit 1;
  name  | freq  | cumfreq | rank | name_type  
--------+-------+---------+------+------------
 SILVER | 0.009 |  47.717 | 1424 | MALE_FIRST

=# select * from census.names where cumfreq > 24.3833348881337 AND name_type = 'LAST' order by cumfreq asc limit 1;
  name  | freq  | cumfreq | rank | name_type 
--------+-------+---------+------+-----------
 HARPER | 0.054 |  24.408 |  185 | LAST

所以这个绅士的名字应该是 Silver Harper。我这辈子都没见过,但是they do exist.

我想在上述查询中返回“Silver”“Harper”而不是随机数。我怎样才能让它像这样工作?


脚注

*:为了简单起见:

SELECT
   CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
   , RANDOM() * 90.020 AS random_first -- dataset is 90% of most popular
   , RANDOM() * 90.483 AS random_last
FROM generate_series(1,10,1);

【问题讨论】:

    标签: sql postgresql join postgresql-9.1


    【解决方案1】:

    我实际上也不了解统计数据。但我认为这就是你想要的

    让我们命名返回随机列的表Randoms

    WITH RANDOMS AS
    (
       SELECT
       CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS random_sex
       , RANDOM() * 90.020 AS random_first 
       , RANDOM() * 90.483 AS random_last
       FROM generate_series(1,10,1)
    )
    SELECT (
            SELECT A.NAME 
            FROM census.names A
            WHERE A.cumfreq > R.random_first
            AND A.name_type = 'MALE_FIRST'
            order by A.cumfreq asc limit 1
           ), 
           (
            SELECT A.NAME 
            FROM census.names A
            WHERE A.cumfreq > R.random_last
            AND A.name_type = 'LAST'
            order by A.cumfreq asc limit 1
           ) AS NAME
    FROM RANDOMS R ;
    

    【讨论】:

      【解决方案2】:

      相关子查询?

      SELECT
        *
      FROM
        yourRandomTable
      INNER JOIN
        census.names         AS first_name
          ON  first_name.cumfreq = (SELECT MIN(cumfreq)
                                    FROM   census.names
                                    WHERE  cumfreq > yourRandomTable.random_first
                                      AND  type    = yourRandomTable.random_sex + '_FIRST')
          AND first_name.type    = yourRandomTable.random_sex + '_FIRST'
      INNER JOIN
        census.names         AS last_name
          ON  last_name.cumfreq  = (SELECT MIN(cumfreq)
                                    FROM   census.names
                                    WHERE  cumfreq > yourRandomTable.random_last
                                      AND  type    = 'LAST')
          AND last_name.type     = 'LAST'
      

      您可以在很大程度上改变这种模式。具体如何选择取决于您如何设置索引。

      【讨论】:

        【解决方案3】:
        EXPLAIN ANALYZE SELECT
          r.sex
          , r.detail
          , COALESCE(
            (SELECT name FROM census.names AS mf WHERE r.sex = 'male' AND mf.name_type = 'MALE_FIRST' AND mf.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
            , (SELECT name FROM census.names AS ff WHERE r.sex = 'female' AND ff.name_type = 'FEMALE_FIRST' AND ff.cumfreq > r.first ORDER BY cumfreq LIMIT 1)
          ) AS first
          , (SELECT name FROM census.names AS l WHERE l.name_type = 'LAST' AND l.cumfreq > r.last ORDER BY cumfreq LIMIT 1) AS last
        FROM (
          SELECT
            RANDOM() * 90.020 AS first
            , RANDOM() * 90.483 AS last
            , CASE WHEN RANDOM() > 0.5 THEN 'male' ELSE 'female' END AS sex
          FROM generate_series(1,10,1)
        ) AS r;
        

        这实际上是我最终的结果。

        【讨论】:

          猜你喜欢
          • 2010-11-25
          • 1970-01-01
          • 2018-05-21
          • 1970-01-01
          • 2011-03-28
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多