【问题标题】:Import spatial CSV data into Postgres/PostGIS database with a rake task使用 rake 任务将空间 CSV 数据导入 Postgres/PostGIS 数据库
【发布时间】:2013-03-10 03:20:15
【问题描述】:

您好,我正在尝试将 CSV 数据导入启用空间功能的 Postgres 数据库。数据可用here。我不确定我哪里出错了,非常感谢任何帮助!我试图做的是使用 D3.js 可视化该数据,并可能显示每个城镇最多图书馆的热密度或某种程度的东西。

File: lib/tasks/import_incidents_csv.rake

require 'csv'

namespace :import_incidents_csv do

  task :create_incidents => :environment do

    csv_text = File.read('/home/mgmacri/data/PublicLibraryBranchLocations.csv')
    csv = CSV.parse(csv_text, :headers => true)

    csv.each do |row|
      row = row.to_hash.with_indifferent_access
      Moulding.create!(row.to_hash.symbolize_keys)
    end

  end

end


user@server:/spatial_project$: rake import_incidents_csv:create_incidents --trace
** Invoke import_incidents_csv:create_incidents (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute import_incidents_csv:create_incidents
rake aborted!
invalid byte sequence in UTF-8
/usr/lib/ruby/1.9.1/csv.rb:1855:in `sub!'
/usr/lib/ruby/1.9.1/csv.rb:1855:in `block in shift'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `loop'
/usr/lib/ruby/1.9.1/csv.rb:1849:in `shift'
/usr/lib/ruby/1.9.1/csv.rb:1791:in `each'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `to_a'
/usr/lib/ruby/1.9.1/csv.rb:1805:in `read'
/usr/lib/ruby/1.9.1/csv.rb:1379:in `parse'
/home/mgmacri/rails/mymap/lib/tasks/import_incidents_csv.rake:8:in `block (2 levels) in                                     
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `call'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:228:in `block in execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:223:in `execute'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:166:in `block in         invoke_with_call_chain'
/usr/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:159:in `invoke_with_call_chain'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/task.rb:152:in `invoke'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:143:in `invoke_task'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block (2 levels)     in top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `each'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:101:in `block in   top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:110:in `run_with_threads'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:95:in `top_level'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:73:in `block in run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:160:in  `standard_exception_handling'
/var/lib/gems/1.9.1/gems/rake-10.0.3/lib/rake/application.rb:70:in `run'
/var/lib/gems/1.9.1/gems/rake-10.0.3/bin/rake:33:in `<top (required)>'
/usr/local/bin/rake:19:in `load'
/usr/local/bin/rake:19:in `<main>'
Tasks: TOP => import_incidents_csv:create_incidents

【问题讨论】:

    标签: ruby-on-rails ruby postgresql import rake


    【解决方案1】:

    Excel 将文件编码为ISO-8859-1,而不是UTF-8。所以告诉 Ruby 以 ISO-8859-1 只读方式打开文件

    file=File.open("input_file", "r:ISO-8859-1")
    

    【讨论】:

    • Excel 默认使用 ANSI 代码页,而不是 ISO 字符集。这些不是完全一样的东西。例如,ISO-8859-1 与 cp1252(也称为 Windows-1252)相似但不相同。最好使用正确的代码页,而不是猜测足够接近的编码 - 或者更好的是,使用 OpenOffice 将 Excel 工作表保存为 UTF-8 并保持理智。请注意,Excel 将在不同系统上使用不同的代码页;例如,中欧用户可能会向您发送 cp1251 文本。见stackoverflow.com/questions/508558/…
    • 另请参阅 en.wikipedia.org/wiki/Windows-1252,其中解释了 Windows-1252 甚至不是 ANSI 标准,尽管 Windows 将其称为“ANSI”代码页。
    • @Craig:谢谢你的解释。
    【解决方案2】:

    使用 postgresql 的原生 CSV 导入比使用 Ruby 的 CSV API 快几个数量级,并且还可以避免相同的编码问题。

    例如:

    namespace :import_incidents_csv do
      task :create_incidents => :environment do
        ActiveRecord::Base.connection.execute "COPY moulding (name, state, postcode, lat, long) FROM '/home/mgmacri/data/PublicLibraryBranchLocations.csv' DELIMITER ',' CSV;"
      end
    end
    

    更多信息:http://www.postgresql.org/docs/9.2/static/sql-copy.html

    【讨论】:

      猜你喜欢
      • 2011-09-26
      • 1970-01-01
      • 2020-04-14
      • 1970-01-01
      • 2017-05-03
      • 1970-01-01
      • 2011-03-21
      • 1970-01-01
      • 2020-05-17
      相关资源
      最近更新 更多