【发布时间】:2014-03-04 16:49:30
【问题描述】:
很长一段时间以来,我一直在尝试将空格分隔的数据格式化为 CSV 结构。
初始位置
初始数据表由下式给出:
Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment
Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment
Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment
它包含大量的空格和不必要的信息。信息的呈现方式有点像这样
Doctor's name | Degree | Years of experience | Specialization | Hospital name | Address | Fees | Schedule | and an unnecessary book appointment field.
我想转换成如下格式
Doctor's name,Specialization,Hospital name,Address,Fees,Schedule
所以当前数据应该是这样的
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist,SHAKTHI E.N.T CARE,Malleswaram,INR 250,MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath,Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250,MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist,V2 E City Family Dental Center,Electronics City,INR 200,MON-SUN10:00AM-8:00PM
到目前为止,我已经成功删除了 Book Appointment 字段。
问题
但是,我在对医院名称进行分类时遇到了困难。由于它的间距变化很大。这个问题可行吗?
编辑
cat -A file 的输出如下:
Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE ^I Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment $
Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic ^I Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment $
Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center ^I Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment
【问题讨论】:
-
您的原始文件中似乎有一些
tabs,您能否运行命令cat -A file并将输出更新给我们? -
我在 EDIT 部分添加了 cat -A 的输出
-
有什么方法可以在专业和医院名称之间进行某种分离?
-
水平制表符是另一种常用的值分隔符。逗号字符不是唯一用于分隔值的字符。我现在问自己,您是否已通过用空格替换制表符来删除分隔符,这样可以很容易地将制表符分隔的 CSV 文件重新格式化为逗号分隔的 CSV 文件,并按您想要的顺序使用您想要的数据。
-
可以使用 Excel 的Import Text Wizard 将使用制表符作为分隔符的 CSV 文件导入到空白 Microsoft Excel 工作表中。
标签: regex csv awk formatting pretty-print