【问题标题】:How to parse complicated CSV file如何解析复杂的 CSV 文件
【发布时间】:2019-07-21 15:39:30
【问题描述】:

我收到一个包含字符串和元组元素组合的 CSV 文件,但找不到正确解析它的方法。我错过了什么明显的东西吗?

csv 文件

"presentation_id","presentation_name","sectionId","sectionNumber","courseId","courseIdentifier","courseName","activity_id","activity_prompt","activity_content","solution","event_timestamp","answer_id","answer","isCorrect","userid","firstname","lastname","email","role"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","62d059e8-9ab4-41d4-9eb8-00ba67d9fac9","A blow to which side of the knee might tear the medial collateral ligament?","{"choices":["medial","lateral"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:54:16.000","7b5048e5-7460-49f8-a64a-763b7f62d771","{"solution":[1],"type":"MultipleChoice"}","1","57ba970d-d02b-4a10-a64d-56f02336ee08","Student","One","student1@example.com","Student"
"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}","{"solution":[1],"selectAll":false,"type":"MultipleChoice"}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{"solution":[3],"type":"MultipleChoice"}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2@example.com","Student"

第一行是标题,第二行是数据

with open(filepathcsv) as csvfile:
    readCSV = csv.reader(csvfile)
    for row in readCSV:
        numcolumns = len(row)
        print(numcolumns,": ",row)

产量:

20 :  ['presentation_id', 'presentation_name', 'sectionId', 'sectionNumber', 'courseId', 'courseIdentifier', 'courseName', 'activity_id', 'activity_prompt', 'activity_content', 'solution', 'event_timestamp', 'answer_id', 'answer', 'isCorrect', 'userid', 'firstname', 'lastname', 'email', 'role']
25 :  ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', '62d059e8-9ab4-41d4-9eb8-00ba67d9fac9', 'A blow to which side of the knee might tear the medial collateral ligament?', '{choices":["medial"', 'lateral]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:54:16.000', '7b5048e5-7460-49f8-a64a-763b7f62d771', '{solution":[1]', 'type:"MultipleChoice"}"', '1', '57ba970d-d02b-4a10-a64d-56f02336ee08', 'William', 'Muter', 'wmuter@umich.edu', 'Student']
27 :  ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{choices":["right rotation"', 'left rotation', 'right lateral rotation', 'left lateral rotation]', 'type:"MultipleChoice"}"', '{solution":[1]', 'selectAll:false', 'type:"MultipleChoice"}"', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{solution":[3]', 'type:"MultipleChoice"}"', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Noah', 'Willett', 'willettn@umich.edu', 'Student']

由于嵌入花括号元素的复杂结构,csv.reader 对每一行的解析方式不同。

...但我希望每行有 20 个元素。

【问题讨论】:

  • 问题出在嵌入的",

标签: python-3.x csv escaping


【解决方案1】:

在记录中,而不是代码。您的代码工作正常。要解决这个问题,您需要修复 csv 文件,因为带有 json 内容的字段没有正确序列化。

只需将一个引号符号 " 更改为两个符号 "" 即可转义它们。

这里是固定 csv 行的例子。

"26cc7957-5a6b-4bde-a996-dd823f54ece7","3-Axial Skeleton F18","937c47b0-cc66-4938-81de-1b1b58388499","001","3b5b5e49-1798-4eab-86d7-186cf59149b4","MOVESCI 230","Human Musculoskeletal Anatomy","f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2","What is the name of this movement?","{""choices"":[""right rotation"",""left rotation"",""right lateral rotation"",""left lateral rotation""],""type"":""MultipleChoice""}","{""solution"":[1],""selectAll"":false,""type"":""MultipleChoice""}","2018-09-30 23:20:33.000","d6cce4d9-37ae-409e-afc5-54ad79f86226","{""solution"":[3],""type"":""MultipleChoice""}","0","921d1b9b-f550-4289-89f1-2a805b27eeb3","Student","Two","student2@example.com","Student"

修复后代码的结果:

20 :  ['26cc7957-5a6b-4bde-a996-dd823f54ece7', '3-Axial Skeleton F18', '937c47b0-cc66-4938-81de-1b1b58388499', '001', '3b5b5e49-1798-4eab-86d7-186cf59149b4', 'MOVESCI 230', 'Human Musculoskeletal Anatomy', 'f82cb32b-45ce-4d3a-aa74-b3fa1a1038a2', 'What is the name of this movement?', '{"choices":["right rotation","left rotation","right lateral rotation","left lateral rotation"],"type":"MultipleChoice"}', '{"solution":[1],"selectAll":false,"type":"MultipleChoice"}', '2018-09-30 23:20:33.000', 'd6cce4d9-37ae-409e-afc5-54ad79f86226', '{"solution":[3],"type":"MultipleChoice"}', '0', '921d1b9b-f550-4289-89f1-2a805b27eeb3', 'Student', 'Two', 'student2@example.com', 'Student']

【讨论】:

    【解决方案2】:

    感谢大家的建议!

    另外,我很抱歉,因为我没有包含我试图解析的原始 CSV 文件(此处为示例:)

    "b5ae18d3-b6dd-4d0a-84fe-7c43df472571"|"Climate_Rapid_Change_W18.pdf"|"18563b1e-a467-44b3-aed7-3607a1acd712"|"001"|"c86c8c8d-dca6-41cd-a310-a83e40d" CLIMATE 102"|"极端天气"|"278c4561-c834-4343-a770-3f544966f633"|"哪个欧洲城市与安娜堡在同一纬度?"|"{"choices":["瑞典斯德哥尔摩","德国柏林","英国伦敦","法国巴黎","西班牙马德里"],"type":"MultipleChoice"}"|"{"solution":[4],"selectAll":false, "type":"MultipleChoice"}"|"2019-01-31 22:11:08.000"|"81392cd3-28e9-4e2e-8a33-018104b1f4d1"|"{"solution":[3,4],"type" :"MultipleChoice"}"|"0"|"2db10c95-b507-4211-8244-394361148b22"|"学生"|"一个"|"student1@umich.edu"|"学生" "ee73fdaf-a926-4899-b0f7-9b942f1b44ad"|"6-肘、腕、手 W19"|"48539109-529e-4359-83b9-2ae81be0532c"|"001"|"3b5b5e49-1798-4eab-86d7-186cf591" |"MOVESCI 230"|"人体肌肉骨骼解剖"|"fcd7c673-d944-48c3-8a09-f458e03f8c44"|"这个动作叫什么名字?"|"{"choices":["第一指关节","first近端指间关节","第一远端指间关节","第一指间关节"],"type":"MultipleChoice"}"|"{"solution":[3],"selectAll":false,"type":" MultipleChoice"}"|"2019-01-31 22:07:32.000"|"9016f36c-41f5-4e14-84a9-78eea682c802"|"{"solution":[3],"type":"MultipleChoice"}"| "1"|"7184708d-4dc7-42e0-b1ea-4aca51f00fcd"|"学生"|"二"|"student2@umich.edu"|"学生"

    您是正确的,问题出在 CSV 文件的形式上。

    1. 我将 readCSV = csv.reader(csvfile) 更改为 readCSV = csv.reader(csvfile, delimiter="|", quotechar='|')
    2. 然后我取出结果列表并从每个元素中删除无关的引号。

    程序的其余部分现在可以正常工作了。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2011-10-05
      • 1970-01-01
      • 1970-01-01
      • 2022-01-10
      • 2018-06-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多