【发布时间】:2015-11-11 11:15:02
【问题描述】:
我有一个带有VITALS 表的患者数据库。此表包含每个患者唯一的 patient ID (PATID) 和 height variable (HT)。单个患者可能记录了>1 身高。
我正在尝试返回在高度范围内和跨高度范围 (e.g., 68-72", 72-76", etc.) 的唯一 PATIDs 计数。每个PATID 都应计入*only once*。但是我发现,如果患者记录了多个身高,他们将在一个范围内计算一次,但如果他们的身高跨越范围,他们将被计算两次 - 每个范围一次。
例如,如果患者的身高记录为 68、72 和 73,他们将在 68-72 范围内计数一次,在 72-76 范围内计数一次。我可以判断这是因为我们有 3054 个唯一 PATID,但查询返回的计数总和 >5000。
我的代码是:
SELECT
CASE
when "HT" >0 and "HT" <=4 then '0-4'
when "HT" >4 and "HT" <=8 then '4-8'
when "HT" >8 and "HT" <=12 then '8-12'
when "HT" >12 and "HT" <=16 then '12-16'
when "HT" >16 and "HT" <=20 then '16-20'
when "HT" >20 and "HT" <=24 then '29-24'
when "HT" >24 and "HT" <=28 then '24-28'
when "HT" >28 and "HT" <=32 then '28-32'
when "HT" >32 and "HT" <=36 then '32-36'
when "HT" >36 and "HT" <=40 then '36-40'
when "HT" >40 and "HT" <=44 then '40-44'
when "HT" >44 and "HT" <=48 then '44-48'
when "HT" >48 and "HT" <=52 then '48-52'
when "HT" >52 and "HT" <=56 then '52-56'
when "HT" >56 and "HT" <=60 then '56-60'
when "HT" >60 and "HT" <=64 then '60-64'
when "HT" >64 and "HT" <=68 then '64-68'
when "HT" >68 and "HT" <=72 then '68-72'
when "HT" >72 and "HT" <=76 then '72-76'
when "HT" >76 and "HT" <=80 then '76-80'
when "HT" >80 and "HT" <=84 then '80-84'
when "HT" >84 and "HT" <=88 then '84-88'
when "HT" IS NULL then 'Null'
else '>88'
END AS "Height Range",
COUNT(DISTINCT vital."PATID") AS "Count"
FROM dbo."VITAL" vital
GROUP BY 1;
【问题讨论】:
-
当患者属于多个 HT 范围时,为什么会首选一个而不是另一个?似乎问题定义和查询都缺少该规则。也许您想要
PATID,max(HT) GROUP BY 1,然后将其分类为范围。 -
if a patient has height recorded as 68, 72, and 73 ...显然,您必须定义要选择的行。并始终提供您的 Postgres 版本。
标签: sql postgresql count aggregate-functions greatest-n-per-group