【问题标题】:Read data in Pyspark from Kinesis从 Kinesis 读取 Pyspark 中的数据
【发布时间】:2020-01-16 13:40:46
【问题描述】:

我正在尝试使用 KinesisUtils.createStream 从 Pyspark 中的 kinesis 读取数据,但问题是我收到此错误。


  Spark Streaming's Kinesis libraries not found in class path. Try one of the following.

  1. Include the Kinesis library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl:2.4.4 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kinesis-asl-assembly, Version = 2.4.4.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kinesis-asl-assembly.jar> ...

________________________________________________________________________________________________


Traceback (most recent call last):
  File "/Users/ahmad.muhammad/Desktop/kinesis-reader.py", line 8, in <module>
    kinesisStream = KinesisUtils.createStream(ssc,'Ahmad-Kineses','twitter-stream','https://kinesis.us-east-1.amazonaws.com/','us-east-1',InitialPositionInStream.TRIM_HORIZON,20)
  File "/Users/Ahmad.Muhammad/opt/apache-spark/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kinesis.py", line 84, in createStream
TypeError: 'JavaPackage' object is not callable

【问题讨论】:

    标签: python apache-spark pyspark spark-streaming amazon-kinesis


    【解决方案1】:

    假设您在本地机器上使用 pyspark,那么您可以做的就是将 env 变量添加到您的代码中,您可以执行类似的操作。 在你的终端尝试

    export PYSPARK_SUBMIT_ARGS = --master local[2] --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.1.0 pyspark-shell
    

    希望这能解决您的问题。

    【讨论】:

      猜你喜欢
      • 2019-09-10
      • 1970-01-01
      • 1970-01-01
      • 2021-12-17
      • 2022-10-26
      • 2017-04-16
      • 2019-04-21
      • 2020-07-02
      • 2023-04-05
      相关资源
      最近更新 更多