【发布时间】:2026-02-12 16:45:02
【问题描述】:
我有csv,列中有换行符。以下是我的示例:
"A","B","C"
1,"This is csv with
newline","This is another column"
"This is newline
and another line","apple","cat"
我可以在 spark 中读取文件,但列内的换行符被视为单独的行。
如何将其准备为 csv,并将文本包含在双引号内。
我只使用 apache csv 插件和 apache 读取文件。
alarms = sc.textFile("D:\Dataset\oneday\oneday.csv")
这给了我 RDD :
**example.take(5)**
[u'A,B,C', u'1,"This is csv with ', u'newline",This is another column', u'"This is newline', u'and another line",apple,cat']
Spark 版本:1.4
【问题讨论】:
-
line.replace('/n','') if line.count('"')%2==1 and '"\n' not in line
标签: python python-2.7 apache-spark pyspark