【问题标题】:Ruby - Merge CSV duplicate columns with same SKURuby - 合并具有相同 SKU 的 CSV 重复列
【发布时间】:2019-12-13 09:44:18
【问题描述】:

我创建了一个关于我的 eshop 的 CSV 文件,其中包含具有不同 SKU 的多个商品。一些 SKU 出现不止一次,因为它们可以属于多个类别(但对于给定的 SKU,标题和价格将始终相同)。示例:

SKU,Title,Category,Price    
001,Soap,Bathroom,0.5    
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8    

我现在希望从该文件创建另一个 CSV 文件,该文件没有重复的 SKU,并按如下方式聚合“类别”属性:

SKU,Title,Category,Price
001,Soap,Bathroom/Kitchen,0.5    
002,Water,Kitchen/Garage,0.4
003,Juice,Kitchen,0.8

我该怎么做?

【问题讨论】:

  • 到目前为止您尝试过什么?您有任何代码可以添加到您的问题中吗?见stackoverflow.com/help/how-to-ask
  • 我尝试了这个问题的代码,但我无法让它为我工作,stackoverflow.com/questions/10973182/merge-rows-csv-by-id-ruby
  • 您在编辑之前的示例包含多个逗号实例,前面和/或后面跟着一个或多个空格。应该没有。例如,如果第一行是"SKU , Title,Category,Price",则前两个字段将被读取为"SKU "" Title"。另一种方法是在解析行后删除空格,这不应该是必需的。

标签: ruby csv merge duplicates sku


【解决方案1】:

据我了解,您希望读取 CSV 文件,对数据执行一些操作,然后将结果写入新的 CSV 文件。你可以这样做。

代码

require 'csv'

def convert(csv_file_in, csv_file_out, group_field, aggregate_field)
  csv = CSV.read(FNameIn, headers: true)
  headers = csv.headers
  arr = csv.group_by { |row| row[group_field] }.
            map do |_,a|
              headers.map { |h| h==aggregate_field ?
                (a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
            end
  CSV.open(FNameOut, "wb") do |csv|
    csv << headers
    arr.each { |row| csv << row }
  end
end

示例

让我们创建一个包含以下数据的 CSV 文件:

s =<<_
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8
_

FNameIn  = 'testin.csv'
FNameOut = 'testout.csv'

IO.write(FNameIn, s)
  #=> 135

现在使用这些值执行方法:

convert(FNameIn, FNameOut, "SKU", "Category")

并确认FNameOut 写入正确:

puts IO.read(FNameOut)
SKU,Title,Category,Price
001,Soap,Bathroom/Kitchen,0.5
002,Water,Kitchen/Garage,0.4
003,Juice,Kitchen,0.8

说明

步骤如下:

csv_file_in = FNameIn
csv_file_out = FNameOut
group_field = "SKU"
aggregate_field = "Category"
csv = CSV.read(FNameIn, headers: true)

CSV::read

headers = csv.headers
  #=> ["SKU", "Title", "Category", "Price"] 
h = csv.group_by { |row| row[group_field] }
  #=> {"001"=>[
         #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
  #      #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">
  #    ],
  #    "002"=>[
  #      #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
  #      #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">
  #    ],
  #    "003"=>[
  #      #<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">
  #    ]
  #   } 
arr = h.map do |_,a|
        headers.map { |h| h==aggregate_field ?
          (a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
      end
   #=> [["001", "Soap", "Bathroom/Kitchen", "0.5"],
   #    ["002", "Water", "Kitchen/Garage", "0.4"],
   #    ["003", "Juice", "Kitchen", "0.8"]] 

参见CSV#headersEnumerable#group_by,这是一种常用的方法。最后,编写输出文件:

CSV.open(FNameOut, "wb") do |csv|
  csv << headers
  arr.each { |row| csv << row }
end

CSV::open。现在让我们回到arr的计算。这很容易通过插入一些puts 语句并执行代码来解释。

arr = h.map do |_,a|
          puts "  _=#{_}"
          puts "  a=#{a}"
          headers.map do |h|
            puts "    header=#{h}"
            if h==aggregate_field
              a.map { |row| row[aggregate_field] }.join('/')
            else
              a.first[h]
            end.
            tap { |s| puts "    mapped to #{s}" }
          end
        end

Object#tap。显示如下。

  _=001
  a=[#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
     #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">]
    header=SKU
    mapped to 001
    header=Title
    mapped to Soap
    header=Category
    mapped to Bathroom/Kitchen
    header=Price
    mapped to 0.5

  _=002
  a=[#<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
     #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">]
    header=SKU
    mapped to 002
    header=Title
    mapped to Water
    header=Category
    mapped to Kitchen/Garage
    header=Price
    mapped to 0.4

  _=003
  a=[#<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">]
    header=SKU
    mapped to 003
    header=Title
    mapped to Juice
    header=Category
    mapped to Kitchen
    header=Price
    mapped to 0.8

【讨论】:

  • @CarySwoveland 是的,我弄错了。谢谢。
【解决方案2】:

似乎为了使这个正确,我们必须假设 SKU 编号和价格始终相同。由于您知道要合并数据的唯一键是Category,因此您可以这样做。

假设这是您的 test.csv,路径与 ruby​​ 脚本相同:

# test.csv
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8

Ruby 脚本与您的 test.csv 文件位于同一目录中

# fix_csv.rb
require 'csv'
rows = CSV.read 'test.csv', :headers => true
skews = rows.group_by{|row| row['SKU']}.keys.uniq
values = rows.group_by{|row| row['SKU']}

merged = skews.map do |key|
  group = values.select{|k,v| k == key}.values.flatten.map(&:to_h)
  category = group.map{|k,v| k['Category']}.join('/')
  new_data = group[0]
  new_data['Category'] = category
  new_data
end

CSV.open('merged_data.csv', 'w') do |csv|
  csv << merged.first.keys # writes the header row
  merged.each do |hash|
    csv << hash.values
  end
end

puts 'see contents of merged_data.csv'

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-05-12
    相关资源
    最近更新 更多