Scala - 在模式匹配表达式中扩展参数列表答案

【问题标题】：Scala - Expanding an argument list in a pattern matching expressionScala - 在模式匹配表达式中扩展参数列表
【发布时间】：2023-03-06 20:42:01
【问题描述】：

我对 Scala 非常陌生，并试图将其用作 Spark 的接口。我在制作通用 CSV 到 DataFrame 函数时遇到问题。例如，我有一个包含大约 50 个字段的 CSV，其中第一个是 task、name 和 id。我可以让以下工作：

val reader = new CSVReader(new StringReader(txt))

reader.readAll().map(_ match {
  case Array(task, name, id, _*) => Row(task, name, id)
  case unexpectedArrayForm =>
    throw new RuntimeException("Record did not have correct number of fields: "+ unexpectedArrayForm.mkString(","))
})

但是，我宁愿不必对创建 spark Row 所需的字段数进行硬编码。我试过这个：

val reader = new CSVReader(new StringReader(txt))

reader.readAll().map(_ match {
  case Array(args @ _*) => Row(args)
  case unexpectedArrayForm =>
    throw new RuntimeException("Record did not have correct number of fields: "+ unexpectedArrayForm.mkString(","))
})

但它只是创建了一个带有单个元素的 Row 对象。如何使它扩展 Row(args) 中的 args 以便如果我有一个包含 N 个元素的数组，我将得到一个包含 N 个元素的 Row？

【问题讨论】：

标签： scala apache-spark pattern-matching

【解决方案1】：

通过添加 _* 将您的输入更改为可变长度：

Row(args:_*)

This is what Row accepts per its apply signature。

事实上，除了将它传递给 Row 之外，您甚至不需要做任何事情，因为它已经是正确的序列类型了。

reader.readAll().map(Row(_:_*))

【讨论】：

【解决方案2】：

这应该可以解决问题：

val reader = new CSVReader(new StringReader(txt))

reader.readAll().map(_ match {
  case a: Array[String] => Row(a:_*)
  case unexpectedArrayForm =>
    throw new RuntimeException("Record did not have correct number of fields: "+ unexpectedArrayForm.mkString(","))
})

已编辑以更正 Array 类型的遗漏

【讨论】：

我必须将其更改为 case a: Array[String] => Row(a:_*) 才能编译。但是，我仍然收到一个错误：java.lang.ArrayIndexOutOfBoundsException: 3 在尝试访问Row 中的id 变量时，这让我认为它仍然没有将数组扩展为Row 的参数。
@SterlingParamore 确保 CSV 数据中没有任何恶意行（例如，文件末尾的空行或其他内容。）也许将 case 语句更改为 @987654327 @ 并有另一个 case 语句捕获其他数组并且什么都不做。
我的java.lang.ArrayIndexOutOfBoundsException 错误与此无关。正在工作，谢谢！