【问题标题】:multiline values in a column while spark read file火花读取文件时列中的多行值
【发布时间】:2019-09-20 07:37:28
【问题描述】:

我有如下数据,我需要根据“,”将其分开

I/p file : 1,2,4,371003\,5371022\,87200000\,U

想要的结果应该是:

a  b  c   d   e                           f
1  2  3   4   371003,5371022,87000000     U
val df = spark.read.option("inferSchma","true").option("escape","\\").option("delimiter",",").csv("/user/txt.csv")

【问题讨论】:

标签: apache-spark rdd


【解决方案1】:

试试这个:

val df = spark.read.csv("/user/txt.csv")
df.show()

+---+---+---+-------+--------+---------+---+
|_c0|_c1|_c2|    _c3|     _c4|      _c5|_c6|
+---+---+---+-------+--------+---------+---+
|  1|  2|  4|371003\|5371022\|87200000\|  U|
+---+---+---+-------+--------+---------+---+



df.select(
    '_c0, '_c1, '_c2,
    regexp_replace(concat_ws(",", '_c3, '_c4, '_c5), "\\\\", ""),
    '_c6
  ).toDF("a","b","c","e","f").show(false)

+---+---+---+-----------------------+---+
|a  |b  |c  |e                      |f  |
+---+---+---+-----------------------+---+
|1  |2  |4  |371003,5371022,87200000|U  |
+---+---+---+-----------------------+---+

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-08-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-04-05
    • 2017-05-19
    • 2021-07-21
    • 1970-01-01
    相关资源
    最近更新 更多