【问题标题】:Spark 1.5.1 Create RDD from Cassandra (ClassNotFoundException: com.datastax.spark.connector.japi.rdd.CassandraTableScanJavaRDD)Spark 1.5.1 从 Cassandra 创建 RDD (ClassNotFoundException: com.datastax.spark.connector.japi.rdd.CassandraTableScanJavaRDD)
【发布时间】:2015-12-07 23:32:57
【问题描述】:

我正在尝试从 cassandra 获取记录并创建 rdd。

JavaRDD<Encounters> rdd = javaFunctions(ctx).cassandraTable("kesyspace1", "employee", mapRowTo(Employee.class)); 

我在 Spark 1.5.1 上提交作业时收到此错误

Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/rdd/CassandraTableScanJavaRDD
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.japi.rdd.CassandraTableScanJavaRDD
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

当前依赖项:

   <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>1.5.1</version>
  </dependency>
  <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>1.5.1</version>
  </dependency>
  <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>2.7.1</version>
  </dependency>
  <dependency>
      <groupId>com.datastax.spark</groupId>
      <artifactId>spark-cassandra-connector-java_2.11</artifactId>
      <version>1.5.0-M2</version>
  </dependency>
 <dependency>
    <groupId>com.datastax.cassandra</groupId>
    <artifactId>cassandra-driver-core</artifactId>
    <version>3.0.0-alpha4</version>
 </dependency>

Java 代码:

import com.tempTable.Encounters;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.SparkConf;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapRowTo; 

 Long now = new Date().getTime();
 SparkConf conf = new SparkConf(true)
    .setAppName("SparkSQLJob_" + now)
     set("spark.cassandra.connection.host", "192.168.1.75")
     set("spark.cassandra.connection.port", "9042");

 SparkContext ctx = new SparkContext(conf);
 JavaRDD<Encounters> rdd = javaFunctions(ctx).cassandraTable("keyspace1", "employee", mapRowTo(Employee.class));
 System.out.println("rdd count = "+rdd.count());

依赖中的版本有问题吗?
请帮助解决此错误。 提前致谢。

【问题讨论】:

  • 你用的是mvn包还是汇编?

标签: apache-spark spark-cassandra-connector


【解决方案1】:

你需要添加带有SparkConf的jar文件

.setJars(Seq(System.getProperty("user.dir") + "/target/scala-2.10/sparktest.jar"))

更多信息请参考http://www.datastax.com/dev/blog/common-spark-troubleshooting

【讨论】:

    【解决方案2】:

    简单的答案是“

    你需要所有的依赖包在 jar 文件中

    执行器机器应该包含你所有依赖的 jar 文件 他们的类路径

    使用 gradle 构建 fatJar 的解决方案:

    buildscript {
        dependencies {
            classpath 'com.github.jengelman.gradle.plugins:shadow:1.2.2'
        }
        repositories {
            jcenter()
        }
    }
    
    apply plugin: 'com.github.johnrengelman.shadow'
    

    然后调用"gradle shadowJar" 来构建你的jar 文件。提交您的工作后,它应该可以解决您的问题。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-01-22
      • 2019-11-12
      • 1970-01-01
      • 2020-06-25
      • 2016-01-30
      • 2016-02-06
      • 2015-10-21
      相关资源
      最近更新 更多