【发布时间】:2017-11-02 05:23:57
【问题描述】:
请注意,CSV 文件的每个单元格中可能有也可能没有多个换行符,并且每个拆分文件也必须是有效的 CSV 文件。
我尝试过使用 split,但是,如果我按行数拆分,它没有考虑到 CSV 可以在字段内有换行符,如果我按文件大小拆分,它有时会剪切最后一行文件分成两半,这意味着它不再是有效的 CSV 文件。
你可以在这里找到一个测试文件: https://pastebin.com/raw/pw9PF9U1
看起来像这样:
post_title,tax:wcpv_product_vendors,post_content
Product title 1,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 2,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 3,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 4,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 5,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 6,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 7,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 8,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
Product title 9,Sample,"<div class=""productdetails"">
<h2 style=""margin: 0px 0px 15px; line-height: 1.2; text-align: center;"">Title</h2>
<p style=""color: #333333; margin: 0px; font-size: 13px; line-height: 23.1111px; padding: 0px; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS';""><strong>Features:</strong></p>
<ul style=""padding: 0px 40px; margin: 0px; color: #333333; font-family: sans-serif, Arial, Verdana, 'Trebuchet MS'; font-size: 13px; line-height: 20.8px;"">
<li style=""list-style: none;"">Testing testing</li>
<li style=""list-style: none;"">One two three</li>
</ul>
</div>"
另请注意,当我在 vim 中打开 csv 时,它的每一行末尾都有一个 ^M 符号。这可能有助于正确拆分。
【问题讨论】:
-
请发布带有“字段内换行符”的数据样本,并提供预期的测试输出。少于 10 条记录应该没问题。
-
您好!我在原始问题中添加了一个示例文件。 You can view it here.
-
^M只是 vim 显示的CRLF(DOS) 行结束。知道你应该通过dos2unix运行它是很有用的,而不是用于拆分文件。