如何分组这个哈希数组答案

【问题标题】：How to group_by this array of hashes如何分组这个哈希数组
【发布时间】：2018-03-12 11:42:43
【问题描述】：

我已将 CSV 格式的数据从文件中读取到以下数组中：

arr = [
["company", "location", "region", "service", "price", "duration", "disabled"], 
["Google", "Berlin", "EU", "Design with HTML/CSS", "120", "30", "false"], ["Google", "San Francisco", "US", "Design with HTML/CSS", "120", "30", "false"], 
["Google", "San Francisco", "US", "Restful API design", "1500", "120", "false"],
["IBM", "San Francisco", "US", "Design with HTML/CSS", "120", "30", "true"],
["Google<script>alert('hi')<script>", "Berlin", "EU", "Practical TDD", "300", "60", "false"],
["Œoogle", "San Francisco", "US", "Restful API design", "1500", "120", "false"],
["Apple", "Berlin", "EU", "Practical TDD", "300", "60", "true"],
["Apple", "London", "EU", "Advanced AngularJS", "1200", "180", "false"],
["Apple", "New York", "US", "Restful API design", "1500", "120", "false"]
]

我想在数据库中导入。基于下面提到的关联

# company.rb
  has_many :regions
  has_many :services

# region.rb
  has_many :branches
  belongs_to :company

# branch.rb
  belongs_to :region
  has_many :services

# service.rb
  belongs_to :company
  belongs_to :branch

可能可以使用下面提到的哈希：（不确定。如果可能，请提出一个好的设计）

{"Google" : [ 
  :name => "Google",
  :regions_attributes => {
    :name => "US", 
    :locations_attributes => {
      :name => "San Francisco"
    }
  },
  :services_attributes: [{
    :name => "Restful API design",
    ...
  },
  {
    :name => "Design with HTML/CSS",
    ...
  }]
]}

我的尝试：

companies = []
CSV.foreach(csv_file, headers: true) do |row|
  company = {}
  company[:name]   = row['company']
  company[:regions_attributes] = {}
  company[:regions_attributes][:name] = row['region']
  company[:regions_attributes][:branches_attributes] = {}
  company[:regions_attributes][:branches_attributes][:name] = row['location']
  company[:services_attributes] = {}
  company[:services_attributes][:name] = row['service']
  company[:services_attributes][:price] = row['price']
  company[:services_attributes][:duration] = row['duration']
  company[:services_attributes][:disabled] = row['disabled']
  companies << company
end

companies.uniq! { |c| c.values }
companies = companies.group_by { |c| c[:name] }

它按公司名称分组。

我想将位于一个区域中的服务分组，如上例中提到的，美国旧金山有两个服务。

更新

基于 Cary Swoveland 的解决方案，我可以根据要求进行修改，但关联并没有像我想象的那样工作。

companies = CSV.read(csv_file, headers: true).group_by {|csv| csv["company"]}
final = []
companies.transform_values do |arr1|
  company = Company.new(name: arr1.pluck("company").first.encode(Encoding.find('ASCII'), encoding_options))
  services = arr1.map do |c|
    { name: c['service'], price: c['price'], duration: c['duration'], disabled: c['disabled'] }
  end.uniq
  company.services.build(services)
  regions = arr1.group_by { |csv| csv["region"] }.transform_values do |arr2|
    branches = []
    branches << arr2.pluck('location').uniq.map { |location| { name: location, services_attributes: services } }
    { name: arr2.pluck('region').uniq.first, branches_attributes: branches.flatten }
  end
  company.regions.build(regions.values)
  final << company
end

Company.import(final, recursive: true) #activerecord-import gem

【问题讨论】：

红宝石是主要语言吗？我可以在 js 中提供帮助
您能否将示例数据提供为 Ruby，而不是电子表格屏幕截图？拥有可以用作实验基础的 sn-p 代码是一个巨大的帮助。
上传文件是一种帮助，但问题是链接容易损坏，读者可能会在未来看到您的问题。第一步是使用CSV.read 将文件读入数组。正确的？最好从“我已将 CSV 格式的数据从文件读取到以下数组中：arr = [...]。这样读者就可以剪切和粘贴。确保包含变量名称（例如，arr ) 所以读者可以直接在答案和 cmets 中引用它，而不必定义它。我建议您删除问题，编辑以进行更改，然后取消删除。
...并使数组尽可能小（当然，仍然具有说明问题解决方案所需的所有基本元素）。
字符串 "<script>alert('hi')<script>" 应该从 arr 中删除（并且，在外观上，arr[2] 应该在自己的行中）。我不完全理解您希望返回的哈希结构。您可以通过在 arr 中为“Google”/“EU”/“London”添加一个元素来澄清这一点，并在要从 arr 生成的哈希中显示键的完整值 "Google"。

标签： ruby-on-rails ruby csv enumerable

【解决方案1】：

考虑更改哈希的结构并使用以下代码构建它。文件'tmp.csv' 包含 csv 文件的前 20 行左右，其链接由 OP 给出。我在最后包含了它的内容。

require 'csv'

CSV.read('tmp.csv', headers: true).group_by { |csv| csv["company"] }.
    transform_values do |arr1|
      arr1.group_by { |csv| csv["region"] }.
           transform_values do |arr2|
             arr2.group_by { |csv| csv["location"] }.
                  transform_values do |arr2|
                    arr2.map { |csv| csv["service"] }.uniq
                  end
           end
    end

  #=> {"Google"=>{
         "EU"=>{
           "Berlin"=>["Design with HTML/CSS","Advanced AngularJS","Restful API design"],
           "London"=>["Restful API design"]
         },
         "US"=>{
            "San Francisco"=>["Design with HTML/CSS", "Restful API design"]
         }
       },
       "Apple"=>{
         "EU"=>{
           "London"=>["Design with HTML/CSS"],
           "Berlin"=>["Restful API design"]
         },
         "US"=>{
           "San Francisco"=>["Design with HTML/CSS"]
         }
       },
       "IBM"=>{
         "US"=>{
           "San Francisco"=>["Design with HTML/CSS"]
         },
         "EU"=>{
           "Berlin"=>["Restful API design"],
           "London"=>["Restful API design"]
         }
      }
     }

如果这种散列格式不合适（但内容是需要的），可以很容易地更改为不同的格式。

请参阅CSV::read、CSV::Row#[]、Enumerable#group_by 和 Hash#transform_values 的文档。

我需要对链接的 csv 文件进行一些预处理。问题是公司名称前面有一个 UTF-8 文件的“字节顺序标记”（搜索“好的，想通了”here。）我使用 Nathan Long [here] 给出的代码删除那些字符。 OP 必须在没有这些标记的情况下写入 CSV 文件，或者在读取文件时将其剥离。

The content of my reduced CSV test file is the following.

arr = ["company,location,region,service,price,duration,disabled\n",
       "Google,Berlin,EU,Design with HTML/CSS,120,30,FALSE\n",
       "Google,San Francisco,US,Design with HTML/CSS,120,30,FALSE\n",
       "Google,San Francisco,US,Restful API design,1500,120,FALSE\n",
       "Apple,London,EU,Design with HTML/CSS,120,30,FALSE\n",
       "Google,Berlin,EU,Design with HTML/CSS,120,30,FALSE\n",
       "Apple,Berlin,EU,Restful API design,1500,120,FALSE\n",
       "IBM,San Francisco,US,Design with HTML/CSS,120,30,TRUE\n",
       "Google,San Francisco,US,Design with HTML/CSS,120,30,FALSE\n",
       "IBM,Berlin,EU,Restful API design,1500,120,TRUE\n",
       "IBM,London,EU,Restful API design,1500,120,TRUE\n",
       "IBM,Berlin,EU,Restful API design,1500,120,TRUE\n",
       "IBM,London,EU,Restful API design,1500,120,TRUE\n",
       "IBM,San Francisco,US,Design with HTML/CSS,120,30,TRUE\n",
       "Google,Berlin,EU,Advanced AngularJS,1200,180,FALSE\n",
       "Google,Berlin,EU,Restful API design,1500,120,FALSE\n", 
       "Google,London,EU,Restful API design,1500,120,FALSE\n",
       "Apple,San Francisco,US,Design with HTML/CSS,120,30,FALSE\n",
       "Google,San Francisco,US,Restful API design,1500,120,FALSE\n",
       "IBM,Berlin,EU,Restful API design,1500,120,TRUE\n"]

【讨论】：

感谢您的建议，我对问题进行了更改。我也需要坚持我在问题中提到的格式。
好的，我将编辑我的答案以构建所需格式的哈希。但是，首先，我需要更全面地了解您希望生成的散列结构。请参阅我对这个问题的新评论。
您现在可以看到我的进度的关联和更新部分
我已经更新了关联。请查看更新后的问题
我还是一头雾水。在您说的问题中，“也许可以使用下面提到的哈希：（不确定。如果可能，请提出一个好的设计）”。我在回答中提出了我认为不错的设计。然后您评论说，“我还需要坚持我在问题中提到的格式”，这似乎是矛盾的。另外，你说的格式不完整，我真的不知道是什么格式。这就是为什么我建议您在 arr 中为“Google/EU/London”添加一行，并在哈希中显示您想要的键 "Google" 的值。