【发布时间】:2010-11-29 23:32:05
【问题描述】:
我需要从 CSV 文件中提取一些数据。 CSV 是一个包含多条记录的 2 列文件。第一列是日期,第二列是需要提取的数据。 CSV 文件的第一行是列标题,因此可以跳过。而且我已经为提取的数据的csv文件创建了列标题,所以不需要,我将使用>>将数据导入其中。
这是 CSV 文件中的 1 条记录/行(多条):
"2009-09-20 00:12:37","a:2:{s:15:""info_buyRequest"";a:5:{s:4:""uenc"";s:116:""aHR0cDovL3N0b3JlLmZvcmdldGhhbmdvdmVycy5jb20vcGF0Y2hlcy9pbmRpdmlkdWFsLXBhdGNoZXMvZnJlZS1zYW1wbGUuaHRtbD9fX19TSUQ9VQ,,"";s:7:""product"";s:1:""1"";s:15:""related_product"";s:0:"""";s:7:""options"";a:13:{i:17;s:2:""59"";i:16;s:2:""50"";i:15;s:2:""49"";i:14;s:2:""47"";i:13;s:2:""41"";i:12;s:2:""34"";i:11;s:2:""25"";i:10;s:2:""23"";i:9;s:2:""19"";i:8;s:2:""17"";i:7;s:2:""12"";i:6;s:1:""9"";i:5;s:1:""5"";}s:3:""qty"";i:1;}s:7:""options"";a:13:{i:0;a:7:{s:5:""label"";s:25:""How did you hear about us"";s:5:""value"";s:22:""Friend / Family Member"";s:11:""print_value"";s:22:""Friend / Family Member"";s:9:""option_id"";s:2:""17"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""59"";s:11:""custom_view"";b:0;}i:1;a:7:{s:5:""label"";s:3:""Age"";s:5:""value"";s:5:""21-24"";s:11:""print_value"";s:5:""21-24"";s:9:""option_id"";s:2:""16"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""50"";s:11:""custom_view"";b:0;}i:2;a:7:{s:5:""label"";s:14:""Marital Status"";s:5:""value"";s:9:""UnMarried"";s:11:""print_value"";s:9:""UnMarried"";s:9:""option_id"";s:2:""15"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""49"";s:11:""custom_view"";b:0;}i:3;a:7:{s:5:""label"";s:3:""Sex"";s:5:""value"";s:6:""Female"";s:11:""print_value"";s:6:""Female"";s:9:""option_id"";s:2:""14"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""47"";s:11:""custom_view"";b:0;}i:4;a:7:{s:5:""label"";s:10:""Occupation"";s:5:""value"";s:7:""Student"";s:11:""print_value"";s:7:""Student"";s:9:""option_id"";s:2:""13"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""41"";s:11:""custom_view"";b:0;}i:5;a:7:{s:5:""label"";s:9:""Education"";s:5:""value"";s:16:""College Graduate"";s:11:""print_value"";s:16:""College Graduate"";s:9:""option_id"";s:2:""12"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""34"";s:11:""custom_view"";b:0;}i:6;a:7:{s:5:""label"";s:16:""Household Income"";s:5:""value"";s:7:""30K-50K"";s:11:""print_value"";s:7:""30K-50K"";s:9:""option_id"";s:2:""11"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""25"";s:11:""custom_view"";b:0;}i:7;a:7:{s:5:""label"";s:23:""Do You Take Supplements"";s:5:""value"";s:2:""No"";s:11:""print_value"";s:2:""No"";s:9:""option_id"";s:2:""10"";s:11:""option_type"";s:5:""radio"";s:12:""option_value"";s:2:""23"";s:11:""custom_view"";b:0;}i:8;a:7:{s:5:""label"";s:40:""How would you rank your typical hangover"";s:5:""value"";s:4:""Mild"";s:11:""print_value"";s:4:""Mild"";s:9:""option_id"";s:1:""9"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""19"";s:11:""custom_view"";b:0;}i:9;a:7:{s:5:""label"";s:51:""What type of establishments do you typically prefer"";s:5:""value"";s:10:""Nightclubs"";s:11:""print_value"";s:10:""Nightclubs"";s:9:""option_id"";s:1:""8"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""17"";s:11:""custom_view"";b:0;}i:10;a:7:{s:5:""label"";s:40:""How often do you usually go out per week"";s:5:""value"";s:3:""1-2"";s:11:""print_value"";s:3:""1-2"";s:9:""option_id"";s:1:""7"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:2:""12"";s:11:""custom_view"";b:0;}i:11;a:7:{s:5:""label"";s:49:""How many drinks do you typically consume per week"";s:5:""value"";s:3:""6-8"";s:11:""print_value"";s:3:""6-8"";s:9:""option_id"";s:1:""6"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:1:""9"";s:11:""custom_view"";b:0;}i:12;a:7:{s:5:""label"";s:53:""How would you prefer to buy our Products"";s:5:""value"";s:6:""Online"";s:11:""print_value"";s:6:""Online"";s:9:""option_id"";s:1:""5"";s:11:""option_type"";s:9:""drop_down"";s:12:""option_value"";s:1:""5"";s:11:""custom_view"";b:0;}}}"
输出应该是在这里找到的数据:
- ""print_value";s:?:""{DATA}""
? 是数字吗,{DATA} 是要提取的数据。
因此,例如这 1 条记录的输出将是:
"2009-09-20 00:12:37","Friend / Family Member","21-24","UnMarried","Female","Student","College Graduate","30K-50K","No","Mild","Nightclubs","1-2","6-8","Online"
我不精通 Sed、AWK 或 Grep,但我知道可以使用其中一种工具(如果不是全部使用三种工具)来完成。非常感谢您在正确方向上的任何帮助或推动。
【问题讨论】:
-
sed将是您选择的工具,但数据来自哪里?第二个字段在我看来就像一个 PHP 序列化数组,也许 PHP 会是更好的工作工具选择? -
您真正想要的是一种解析序列化 PHP 数据结构的方法,而不仅仅是普通的 CSV。
-
数据已从 Magento 商店的 MySQL 数据库中提取。有关我如何提取它的更多信息,请参阅我的最后一个问题:stackoverflow.com/questions/1452336/…
-
Bash 不是这项工作的正确工具——看看下面更接近完整的解决方案是多么复杂,并且不能保证它们能解决所有的极端情况。使用具有长期、QAed、同行评审的 CSV 库可用的语言(例如,Python 在其标准库中提供),而不是试图在没有的情况下一起破解某些东西。
-
@BassKozz 你试过使用
grep -o吗?grep -o '""print_value"";s:[0-9]*:""[^"]*""'