postgresql在单独列中不同行的百分比比较答案

【问题标题】：postgresql percentage comparison of different rows in a separate columnpostgresql在单独列中不同行的百分比比较
【发布时间】：2014-08-08 00:07:55
【问题描述】：

我在 POSTGRESQL 中有一个表，它实际上是由一堆 JOINS 生成的 VIEW，最终看起来像这样：

 test_type |  brand  | model  | band | firmware_version | avg_throughput
-----------+---------+--------+------+------------------+----------------
 1client   | Linksys | N600   | 5ghz | 1.5              |          66.94
 1client   | Linksys | N600   | 5ghz | 2.0              |          94.98
 1client   | Linksys | N600   | 5ghz | 2.11             |         132.40
 1client   | Linksys | EA6500 | 5ghz | 1.5              |         216.46
 1client   | Linksys | EA6500 | 5ghz | 2.0              |         176.79
 1client   | Linksys | EA6500 | 5ghz | 2.11             |         191.44

我想要完成的是创建另一个列，该列将比较并显示每个模型的不同firmware versions 之间的throughput 百分比差异。

更具体地说，查询将获取最低固件版本的吞吐量并将其保存为与所有其他固件版本的吞吐量进行比较的基础。

因此，如果我们采用Linksys N600，最低固件版本为1.5，吞吐量为66.94，我们会将其保存为基线，并将其他吞吐量与该数字进行比较，并显示百分比差异。

表格的最终结果如下所示：

 test_type |  brand  | model  | band | firmware_version | avg_throughput | comparison
-----------+---------+--------+------+------------------+----------------+------------
 1client   | Linksys | N600   | 5ghz | 1.5              |          66.94 | 0% (or empty)
 1client   | Linksys | N600   | 5ghz | 2.0              |          94.98 | +41.61%
 1client   | Linksys | N600   | 5ghz | 2.11             |         132.40 | +97.78%
 1client   | Linksys | EA6500 | 5ghz | 0.5              |         216.46 | 0% (or empty)
 1client   | Linksys | EA6500 | 5ghz | 1.2              |         176.79 | -18.32%
 1client   | Linksys | EA6500 | 5ghz | 2.5              |         191.44 | -11.55%

关于如何做到这一点的任何想法？

我喜欢保持逻辑和分离，现在我不考虑在我的代码中进行此计算，我宁愿在我的数据库上完成此操作，然后只显示结果，但如果不这样做，我愿意接受建议有道理。

【问题讨论】：

标签： postgresql

【解决方案1】：

这可以使用窗口函数轻松解决：

select test_type, 
       brand, 
       model, 
       band, 
       firmware_version, 
       avg_throughput,
       ((avg_throughput / first_value(avg_throughput) over (partition by brand, model order by firmware_version)) - 1) * 100 as diff_to_first_version
from temp_table
order by model desc, firmware_version;

您还可以通过使用lag() 而不是first_value() 将差异添加到以前的版本而不仅仅是第一个版本

select test_type, 
       brand, 
       model, 
       band, 
       firmware_version, 
       avg_throughput,
       ((avg_throughput / first_value(avg_throughput) over (partition by brand, model order by firmware_version)) - 1) * 100 as diff_to_first_version,
       ((avg_throughput / lag(avg_throughput) over (partition by brand, model order by firmware_version)) - 1) * 100 as diff_to_prev_version
from temp_table
order by model desc, firmware_version;

SQLFiddle 示例：http://sqlfiddle.com/#!15/9746f/1

这将比在表上使用自联接的解决方案更快。

【讨论】：

感谢您的回答，这是迄今为止最简洁的查询，感谢您对 PARTITION 的介绍。

【解决方案2】：

在视图上使用子查询通过 window function 返回每个模型的最早固件版本的基本吞吐量，然后将您的视图加入到该视图中：

select
  v.test_type, v.brand, v.model, v.band, v.firmware_version, v.avg_throughput,
  (100 * v.avg_throughput / b.avg_throughput)::decimal(8,2) - 100 as percent_gain
  from myview v
join (select test_type, brand, model, band,
      avg_throughput, rank() OVER (PARTITION BY test_type, brand, model, band
      order by firmware_version) as rank
      from myview) b
on v.test_type = b.test_type
and v.brand = b.brand
and v.model = b.model
and v.band = b.band
and rank = 1

See SQLFiddle 使用您的样本数据并产生您的预期输出。

您可以使用相关子查询而不是连接来完成此操作，但性能会很糟糕，因为这样的查询必须为每一行执行一次。通过使用这样的连接，获取最小值的查询只执行一次。

【讨论】：

非常感谢，这与我想要完成的非常接近。实际上，我需要获取最早（最低）固件版本的吞吐量，然后将其作为基线。我尝试自己解决这个问题，但还没有运气。
firmware_version 是什么数据类型？数字还是文字？假设文本，它们都是n.nn的形式吗？即不是n.n.nn？问题是2.11 在数字上比2.2“早”，但在“版本上”晚（因为11 大于2）。弄清楚哪个固件版本“较早”实际上是挑战的重要组成部分。让我知道它们可能是什么格式
是数字，我这里提供的版本只是一个例子，实际上固件版本总是一致的，所以如果我简单地做 min(firmware_version) 就足以让我得到最早的版本。很抱歉造成混乱。
但min(avg_throughput) 会选择最低吞吐量，而不是第一个（最低）固件版本的吞吐量
@a_horse_with_no_name OK 排序（包括小提琴）。至少我得复习一下我的窗口函数语法:)（和 iphone 大拇指功夫）

【解决方案3】：

经过here 的一些帮助，我想出了这个查询，它可以满足我的需要。

 SELECT
   v.test_type, v.brand, v.model, v.band, v.firmware_version, v.avg_throughput,
   ROUND((100 * v.avg_throughput / (CASE b.min_avg WHEN 0 THEN NULL ELSE b.min_avg END)) - 100::numeric, 2) AS percentage
FROM temp_table v
JOIN (SELECT DISTINCT ON (test_type, model) 
test_type, brand, model, band, firmware_version, avg_throughput AS min_avg
FROM temp_table
ORDER BY test_type, model, firmware_version) b
    ON v.test_type = b.test_type
    AND v.brand = b.brand
    AND v.model = b.model
    AND v.band = b.band;

感谢大家的帮助！

【讨论】：