根据键合并散列数组中的散列值答案

【问题标题】：Merging hash values in an array of hashes based on key根据键合并散列数组中的散列值
【发布时间】：2015-02-03 02:11:31
【问题描述】：

我有一个类似这样的哈希数组：

[
  {"student": "a","scores": [{"subject": "math","quantity": 10},{"subject": "english", "quantity": 5}]},
  {"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]},
  {"student": "a", "scores": [ { "subject": "math", "quantity": 2},{"subject": "science", "quantity": 5 } ] }
]

除了循环遍历数组并找到一个副本然后将它们组合起来之外，是否有一种更简单的方法来获得与此类似的输出？

[
  {"student": "a","scores": [{"subject": "math","quantity": 12},{"subject": "english", "quantity": 5},{"subject": "science", "quantity": 5 } ]},
  {"student": "b", "scores": [{"subject": "math","quantity": 1 }, {"subject": "english","quantity": 2 } ]}
]

合并重复对象的规则：

学生在匹配“值”时合并（例如学生“a”、学生“b”）
添加相同科目的学生分数（例如，学生 a 的数学分数 2 和 10 合并后变为 12）

【问题讨论】：

标签： ruby arrays hash merge duplicates

【解决方案1】：

在这种情况下，有两种聚合值的常用方法。第一种是采用Enumerable#group_by 方法，正如@engineersmnky 在他的回答中所做的那样。第二种是使用Hash#update（又名merge!）方法的形式构建一个散列，该方法使用一个块来解析两个被合并的散列中存在的键的值。我的解决方案使用后一种方法，不是因为我更喜欢它而不是group_by，而是为了向您展示一种不同的方法。（如果工程师mnky 使用update，我会选择group_by。）

您使用的特定数据结构使您的问题有些复杂。我发现通过首先将数据转换为不同的结构，更新分数，然后将结果转换回您的数据结构，可以简化解决方案并使其更易于遵循。您可能需要考虑更改数据结构（如果您愿意的话）。我已经在“讨论”部分解决了这个问题。

代码

def combine_scores(arr)
  reconstruct(update_scores(simplify(arr)))
end

def simplify(arr)
  arr.map do |h|
    hash = Hash[h[:scores].map { |g| g.values }]
    hash.default = 0
    { h[:student]=> hash }
  end
end

def update_scores(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g) do |_, h_scores, g_scores|
      g_scores.each { |subject,score| h_scores[subject] += score }
      h_scores
    end
  end
end

def reconstruct(h)
  h.map { |k,v| { student: k, scores: v.map { |subject, score|
    { subject: subject, score: score } } } }
end

示例

arr = [
  { student: "a", scores: [{ subject: "math",    quantity: 10 },
                           { subject: "english", quantity:  5 }] },
  { student: "b", scores: [{ subject: "math",    quantity:  1 },
                           { subject: "english", quantity:  2 } ] },
  { student: "a", scores: [{ subject: "math",    quantity:  2 },
                           { subject: "science", quantity:  5 } ] }]
combine_scores(arr)
  #=> [{ :student=>"a",
  #      :scores=>[{ :subject=>"math",    :score=>12 },
  #                { :subject=>"english", :score=> 5 },
  #                { :subject=>"science", :score=> 5 }] },
  #    { :student=>"b",
  #      :scores=>[{ :subject=>"math",    :score=> 1 },
  #                { :subject=>"english", :score=> 2 }] }]

说明

首先考虑两个中间计算：

a = simplify(arr)
  #=> [{ "a"=>{ "math"=>10, "english"=>5 } },
  #    { "b"=>{ "math"=> 1, "english"=>2 } },
  #    { "a"=>{ "math"=> 2, "science"=>5 } }]

h = update_scores(a)
  #=> {"a"=>{"math"=>12, "english"=>5, "science"=>5}
  #    "b"=>{"math"=> 1, "english"=>2}}

然后

reconstruct(h)

返回如上所示的结果。

+ 简化

arr.map do |h|
  hash = Hash[h[:scores].map { |g| g.values }]
  hash.default = 0
  { h[:student]=> hash }
end

这会将每个散列映射为一个更简单的散列。比如arr的第一个元素：

h = { student: "a", scores: [{ subject: "math",    quantity: 10 },
                             { subject: "english", quantity:  5 }] }

映射到：

{ "a"=>Hash[[{ subject: "math",    quantity: 10 },
             { subject: "english", quantity:  5 }].map { |g| g.values }] }
#=> { "a"=>Hash[[["math", 10], ["english", 5]]] }
#=> { "a"=>{"math"=>10, "english"=>5}}

将每个哈希的默认值设置为零可以简化更新步骤，如下所示。

+ update_scores

对于simplify 返回的哈希数组a，我们计算：

a.each_with_object({}) do |g,h|
  h.update(g) do |_, h_scores, g_scores|
    g_scores.each { |subject,score| h_scores[subject] += score }
    h_scores
  end
end

a（一个哈希）的每个元素都被合并到一个初始为空的哈希 h。由于update（与merge! 相同）用于合并，h 被修改。如果两个哈希共享相同的键（例如，“math”），则将值相加；否则将subject=>score 添加到h。

注意，如果h_scores 没有密钥subject，那么：

h_scores[subject] += score
  #=> h_scores[subject] = h_scores[subject] + score
  #=> h_scores[subject] = 0 + score (because the default value is zero)
  #=> h_scores[subject] = score

也就是说，来自g_scores 的键值对只是添加到h_scores。

我已将代表主题的块变量替换为占位符_，以减少出错的机会并通知读者它未在块中使用。

+ 重构

最后一步是将update_scores返回的hash转换回原来的数据结构，很简单。

讨论

如果你改变了数据结构，并且符合你的要求，你不妨考虑把它改成combine_scores产生的那个：

h = { "a"=>{ math: 10, english: 5 }, "b"=>{ math:  1, english: 2 } }

然后更新分数：

g = { "a"=>{ math: 2, science: 5 }, "b"=>{ english: 3 }, "c"=>{ science: 4 } }

您只需执行以下操作：

h.merge(g) { |_,oh,nh| oh.merge(nh) { |_,ohv,nhv| ohv+nhv } }
  #=> { "a"=>{ :math=>12, :english=>5, :science=>5 },
  #     "b"=>{ :math=> 1, :english=>5 },
  #     "c"=>{ :science=>4 } }

【讨论】：

【解决方案2】：

除了循环遍历数组并找到重复项然后组合它们之外，是否有更简单的方法来获得与此类似的输出？

我不知道。如果您解释这些数据的来源，答案可能会有所不同，但仅基于 Hash 对象中的 Array，我认为您将不得不迭代和组合。

虽然它并不优雅，但您可以使用这样的解决方案

arr = [
      {"student"=> "a","scores"=> [{"subject"=> "math","quantity"=> 10},{"subject"=> "english", "quantity"=> 5}]},
      {"student"=> "b", "scores"=> [{"subject"=> "math","quantity"=> 1 }, {"subject"=> "english","quantity"=> 2 } ]},
      {"student"=> "a", "scores"=> [ { "subject"=> "math", "quantity"=> 2},{"subject"=> "science", "quantity"=> 5 } ] }
    ]
#Group the array by student
arr.group_by{|student| student["student"]}.map do |student_name,student_values|
  {"student" => student_name,
  #combine all the scores and group by subject
  "scores" => student_values.map{|student| student["scores"]}.flatten.group_by{|score| score["subject"]}.map do |subject,subject_values|
    {"subject" => subject,
    #combine all the quantities into an array and reduce using `+`
    "quantity" => subject_values.map{|h| h["quantity"]}.reduce(:+)
    }
  end
  }
end
#=> [
    {"student"=>"a", "scores"=>[
                        {"subject"=>"math", "quantity"=>12},  
                        {"subject"=>"english", "quantity"=>5}, 
                        {"subject"=>"science", "quantity"=>5}]}, 
    {"student"=>"b", "scores"=>[
                        {"subject"=>"math", "quantity"=>1}, 
                        {"subject"=>"english", "quantity"=>2}]}
    ]

我知道您指定了预期的结果，但我想指出，使输出更简单会使代码更简单。

 arr.map(&:dup).group_by{|a| a.delete("student")}.each_with_object({}) do |(student, scores),record|
   record[student] = scores.map(&:values).flatten.map(&:values).each_with_object(Hash.new(0)) do |(subject,score),obj|
     obj[subject] += score
     obj
  end
  record
 end
 #=>{"a"=>{"math"=>12, "english"=>5, "science"=>5}, "b"=>{"math"=>1, "english"=>2}}

通过这种结构，让学生像拨打.keys 一样简单，而且分数也同样简单。我在想类似的事情

above_result.each do |student,scores|
    puts student
    scores.each do |subject,score|
      puts "  #{subject.capitalize}: #{score}"
    end
  end
end

控制台输出将是

a
  Math: 12
  English: 5
  Science: 5
b
  Math: 1
  English: 2

【讨论】：

数据以网络服务请求的形式出现。在继续进行其余处理之前，我必须整合数据。
@user3075906 我已经用一个更简单的返回结构更新了我的答案，我认为它更适合所描述的场景。