【问题标题】:Join Multiple Lists in Scala在 Scala 中加入多个列表
【发布时间】:2018-08-08 06:59:00
【问题描述】:

我有一系列列表(假设以下 3 个),其中第一个元素都表示一个主键。

var A= List((1,"A"), (2,"B"), (3,"C"))
var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var C= List((1,"AAA"), (3,"CCC"))

我想将它们完全加入到一个新的列表中,如下所示。您可以假设结果元组中的项目数预先确定为 4

(1, "A", "AA", "AAA")
(2, "B", "BB", ""   )
(3, "C", "CC", "CCC")
(4, "" , "DD", ""   )

我怎样才能以实用的方式并使用 Scala 来实现这一点?

【问题讨论】:

  • 嗨 :-) 到目前为止你尝试了什么?
  • 元组在 Scala 中受到限制:它们不能超过 22 项,并且对其长度进行抽象可能很麻烦。你能放宽这个限制吗?
  • 他们不会超过 22。让我更新问题限制大小。这里有一些内部连接两个列表的答案,但不是 n 个列表。

标签: scala


【解决方案1】:

假设您正在获取一个输入列表,例如

var A= List((1,"A"), (2,"B"), (3,"C"))
var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var C= List((1,"AAA"), (3,"CCC"))

然后通过应用以下函数,

List(A,B,C).flatten.groupBy(_._1).map{
  case (k,v) => k :: v.map(_._2)
}

你会得到一个输出

res0: scala.collection.immutable.Iterable[List[Any]] = List(List(2, B, BB), List(4, DD), List(1, A, AA, AAA), List(3, C, CC, CCC))

但是,如果您仍然想在输出中获取空字符串,您可以尝试以下操作

var A= List((1,"A"), (2,"B"), (3,"C"))
var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
var C= List((1,"AAA"), (3,"CCC"))

val intermediate = List(A,B,C).flatten.groupBy(_._1).map{
  case (k,v) => k :: v.map(_._2)
}

val maxSize = intermediate.map(_.size).max
intermediate.map{
  x =>  x.size== maxSize match {
    case true =>
      x
    case false =>
      x ::: List.fill(maxSize-x.size)("")
  }
}

这会为您获取输出

res0: scala.collection.immutable.Iterable[List[Any]] = List(List(2, "B", "BB", ), List(4, "DD", , ), List(1, "A", "AA", "AAA"), List(3, "C", "CC", "CCC"))

元组有性能限制,并且它的大小限制为 22,因此强烈建议使用列表。

【讨论】:

    【解决方案2】:

    可以用尾递归来解决

    var a= List((1,"A"), (2,"B"), (3,"C"))
    var b= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
    var c= List((1,"AAA"), (3,"CCC"))
    
    val lst: List[List[(Int, String)]] = List(a, b, c)
    
    def fun(input: List[List[(Int, String)]]): List[Any] = {
    @tailrec
    def itr(acc: List[Any], inp: List[List[(Int, String)]], key: Int, maxKey: Int): List[Any] = {
      key match {
        case x if x > maxKey => acc
        case _ =>
          itr(acc ::: List(key :: inp.map(itemLst => {
          itemLst.find(_._1 == key).map(_._2).getOrElse("")
          })), inp, key + 1, maxKey)
      }
    }
    itr(List(), input, input.head.head._1, input.map(_.length).max)
    }
    
    println(fun(lst))
    

    输出是

    List(List(1, A, AA, AAA), List(2, B, BB, ), List(3, C, CC, CCC), List(4, , DD, ))
    

    【讨论】:

      【解决方案3】:

      正如评论中提到的,Scala 中的元组受到限制,对它们的元组进行抽象可能很麻烦。如果你想这样做,你可能想看看 Shapeless。

      对于更直接(虽然不是很干净)的解决方案,以下将做(针对两个不同的 target arities 的实现):

      val a = List((1,"A"), (2,"B"), (3,"C"))
      val b = List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
      val c = List((1,"AAA"), (3,"CCC"))
      
      def join4[K, V](empty: V)(pss: List[(K, V)]*): List[(K, V, V, V)] =
        pss.reduceOption(_ ++ _).fold(List.empty[(K, V, V, V)])(_.groupBy(_._1).mapValues(_.map(_._2)).collect {
          case (key, Nil) => (key, empty, empty, empty)
          case (key, List(a)) => (key, a, empty, empty)
          case (key, List(a, b)) => (key, a, b, empty)
          case (key, List(a, b, c)) => (key, a, b, c)
          case (key, list) => throw new RuntimeException(s"Group for $key is too long (${list.size} > 3)")
        }.toList)
      
      def join5[K, V](empty: V)(pss: List[(K, V)]*): List[(K, V, V, V, V)] =
        pss.reduceOption(_ ++ _).fold(List.empty[(K, V, V, V, V)])(_.groupBy(_._1).mapValues(_.map(_._2)).collect {
          case (key, Nil) => (key, empty, empty, empty, empty)
          case (key, List(a)) => (key, a, empty, empty, empty)
          case (key, List(a, b)) => (key, a, b, empty, empty)
          case (key, List(a, b, c)) => (key, a, b, c, empty)
          case (key, List(a, b, c, d)) => (key, a, b, c, d)
          case (key, list) => throw new RuntimeException(s"Group for $key is too long (${list.size} > 4)")
        }.toList)
      
      join4("")(a, b, c)
      join5("")(a, b, c)
      

      您可以使用此代码on Scastie

      【讨论】:

        【解决方案4】:

        正如问题中提到的“我们可以假设结果元组中的项目数是predetermined to be 4”,以下解决方案只返回所请求的元组: 给出的列表是:

        var A= List((1,"A"), (2,"B"), (3,"C"))
        var B= List((1,"AA"), (2,"BB"), (3,"CC"), (4,"DD"))
        var C= List((1,"AAA"), (3,"CCC"))
        

        在 Scala REPL 中:

        scala> val list1 = List(A,B,C).flatten
        list1: List[(Int, String)] = List((1,A), (2,B), (3,C), (1,AA), (2,BB), (3,CC), (4,DD), (1,AAA), (3,CCC))
        
        scala> val list2 = List(A,B,C).flatten.map(x=>x._2.toArray).flatten.distinct
        list2: List[Char] = List(A, B, C, D)
        

        然后使用上面的two lists,可以得到所需的resultList,如下:

        scala> val resultList = 
                  list2.map(x=>list1.filter(y=>y._2.contains(x))).map{
                    case List() =>
                    case List((a,b)) => (a,b,"","")
                    case List((a,b),(_,c))=>(a,b,c,"")
                    case List((a,b),(_,c),(_,d)) =>(a,b,c,d)    
                }
        resultList: List[Any] = List((1,A,AA,AAA), (2,B,BB,""), (3,C,CC,CCC), (4,DD,"",""))
        

        但是,如果我们确实关心empty string ""each tuple 中的位置,代码会变得有点冗长,因为我们必须考虑pattern matching 中带有if 条件的case 语句中的所有组合,如下所示:

        scala> val resultList =
                   list2.map(x=>list1.filter(y=>y._2.contains(x))).map{
               case List() =>
               case List((a,b)) if(b.size==1) => (a,b,"","")
               case List((a,b)) if(b.size==2) => (a,"",b,"")
               case List((a,b)) if(b.size==3) => (a,"","",b)
               case List((a,b),(_,c)) if(b.size==1 && c.size==2)=>(a,b,c,"")
               case List((a,b),(_,c)) if(b.size==2 && c.size==1)=>(a,c,b,"")
               case List((a,b),(_,c)) if(b.size==1 && c.size==3)=>(a,b,"",c)
               case List((a,b),(_,c)) if(b.size==3 && c.size==1)=>(a,c,"",b)
               case List((a,b),(_,c)) if(b.size==2 && c.size==3)=>(a,"",b,c)
               case List((a,b),(_,c)) if(b.size==3 && c.size==2)=>(a,"",c,b)
               case List((a,b),(_,c),(_,d)) if(b.size==1&&c.size==2 && d.size==3)=> 
                    (a,b,c,d)
               case List((a,b),(_,c),(_,d)) if(b.size==1&&c.size==3 && d.size==2)= 
                    (a,b,d,c)
               case List((a,b),(_,c),(_,d)) if(b.size==2&&c.size==1&& d.size==3)=>  
                    (a,c,b,d)
               case List((a,b),(_,c),(_,d)) if(b.size==2&&c.size==3&& d.size==1)=>  
                    (a,d,b,c)
               case List((a,b),(_,c),(_,d)) if(b.size==3&&c.size==1&& d.size==2)=>  
                    (a,c,d,b)
               case List((a,b),(_,c),(_,d)) if(b.size==3&&c.size==2&& d.size==1)=>  
                    (a,d,c,b)
        
               }
        resultList: List[Any] = List((1,A,AA,AAA), (2,B,BB,""), (3,C,CC,CCC), (4,"",DD,""))
        

        但是应该注意的是,在使用元组进行此类操作时,类型信息将丢失并且难以使用生成的元组列表进行处理。改用 List 等其他数据结构可能会更好。但是,鉴于问题中提到的要求,这已解决。

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2020-08-04
          • 2020-11-05
          • 2020-01-18
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多