【问题标题】:AWS GLUE: Cassandra connection using SSL is not workingAWS GLUE:使用 SSL 的 Cassandra 连接不起作用
【发布时间】:2021-11-12 09:51:19
【问题描述】:

我想使用 Spark 连接到 Cassandra,当我尝试使用它正在工作的默认端口连接 Cassandra 时,但是当我尝试通过 SSL 访问它时,作业失败,下面是代码:

val spark: SparkSession = SparkSession.builder()
.config("spark.cassandra.connection.host","server.abc")
        .config("spark.cassandra.connection.port","9142")
        .config("spark.cassandra.connection.ssl.enabled",true)
        .config("spark.cassandra.connection.ssl.trustStore.path","s3:/dev-code/certs/trust.jks")
        .config("spark.cassandra.connection.ssl.trustStore.password","mypass")
        .config("spark.cassandra.auth.username","myuser")
        .config("spark.cassandra.auth.password","userpass")
        .appName("CassandraIntegration").getOrCreate()

仅供参考:它可以访问 S3 存储桶,我可以从同一位置读取 CSV 文件。此外,两个端口都启用了 9042 和 9142。关闭 9042 并仅保留 9142 端口仍然存在错误。

以下是错误:

ERROR [main] glue.ProcessLauncher (Logging.scala:logError(94)): Exception in User Class
java.io.IOException: Failed to open native connection to Cassandra at {server.abc:9142} :: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
    at com.datastax.spark.connector.cql.CassandraConnector$.createSession(CassandraConnector.scala:173)
    at com.datastax.spark.connector.cql.CassandraConnector$.$anonfun$sessionCache$1(CassandraConnector.scala:161)
    at com.datastax.spark.connector.cql.RefCountedCache.createNewValueAndKeys(RefCountedCache.scala:32)
    at com.datastax.spark.connector.cql.RefCountedCache.syncAcquire(RefCountedCache.scala:69)
    at com.datastax.spark.connector.cql.RefCountedCache.acquire(RefCountedCache.scala:57)
    at com.datastax.spark.connector.cql.CassandraConnector.openSession(CassandraConnector.scala:81)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:103)
    at com.datastax.spark.connector.datasource.CassandraCatalog$.com$datastax$spark$connector$datasource$CassandraCatalog$$getMetadata(CassandraCatalog.scala:455)
    at com.datastax.spark.connector.datasource.CassandraCatalog$.getTableMetaData(CassandraCatalog.scala:421)
    at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:68)
    at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
    at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:296)
    at scala.Option.map(Option.scala:230)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:266)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
    at MyCsvToCassandrsJob$.main(csv-to-cassanra-job:63)
    at MyCsvToCassandrsJob.main(csv-to-cassanra-job-job)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke(ProcessLauncher.scala:47)
    at com.amazonaws.services.glue.SparkProcessLauncherPlugin.invoke$(ProcessLauncher.scala:47)
    at com.amazonaws.services.glue.ProcessLauncher$$anon$1.invoke(ProcessLauncher.scala:75)
    at com.amazonaws.services.glue.ProcessLauncher.launch(ProcessLauncher.scala:123)
    at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:29)
    at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala)
Caused by: java.lang.IllegalArgumentException: Error instantiating class com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory (specified by advanced.ssl-engine-factory.class): Cannot initialize SSL Context
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:253)
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:108)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslEngineFactory(DefaultDriverContext.java:414)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.lambda$new$4(DefaultDriverContext.java:279)
    at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslEngineFactory(DefaultDriverContext.java:733)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.buildSslHandlerFactory(DefaultDriverContext.java:470)
    at com.datastax.oss.driver.internal.core.util.concurrent.LazyReference.get(LazyReference.java:55)
    at com.datastax.oss.driver.internal.core.context.DefaultDriverContext.getSslHandlerFactory(DefaultDriverContext.java:799)
    at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.init(DefaultSession.java:348)
    at com.datastax.oss.driver.internal.core.session.DefaultSession$SingleThreaded.access$1100(DefaultSession.java:300)
    at com.datastax.oss.driver.internal.core.session.DefaultSession.lambda$init$0(DefaultSession.java:146)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.PromiseTask.run(PromiseTask.java:106)
    at com.datastax.oss.driver.shaded.netty.channel.DefaultEventLoop.run(DefaultEventLoop.java:54)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
    at com.datastax.oss.driver.shaded.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at com.datastax.oss.driver.shaded.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Cannot initialize SSL Context
    at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:74)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at com.datastax.oss.driver.internal.core.util.Reflection.buildFromConfig(Reflection.java:246)
    ... 18 more
Caused by: java.nio.file.NoSuchFileException: s3:/dev-code/certs/trust.jks
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.newByteChannel(Files.java:407)
    at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
    at java.nio.file.Files.newInputStream(Files.java:152)
    at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.buildContext(DefaultSslEngineFactory.java:119)
    at com.datastax.oss.driver.internal.core.ssl.DefaultSslEngineFactory.<init>(DefaultSslEngineFactory.java:72)
    ... 23 more

如果有任何解决此问题的方法,将有很大帮助。

【问题讨论】:

  • 问题是执行实际连接的Java驱动程序对S3 url一无所知,并且需要本地文件路径。理论上你可以通过--files 指定它们
  • 感谢回复我尝试添加一个新参数--extra-files和值作为s3://dev-code/certs/trust.jks,但仍然得到同样的错误原因: java.nio.file.NoSuchFileException: /tmp/trust.jks

标签: scala apache-spark cassandra spark-cassandra-connector


【解决方案1】:

在您的错误消息的底部,我看到:

NoSuchFileException: s3:/dev-code/certs/trust.jks

Alex 是对的,因为您需要提供 Spark 连接器实际可以访问的该文件的路径。从表面上看,S3 在这里是行不通的。

【讨论】:

  • 嗨亚伦,我尝试添加,但没有运气。我只得到 pyspark 引用是否有任何 scala 文档用于访问 --extra-files 值。
  • 我试图列出 /tmp 中存在的文件,我可以看到 .jks 文件存在但仍然出现相同的错误
  • @Anbinson 关于 pyspark 的类似问题可能会有所帮助。在这种情况下,他们确实参考了谷歌的云存储,所以也许可以做到。 stackoverflow.com/questions/34939520/…
【解决方案2】:

将 .jks s3 文件添加到 Glue Job 的“引用文件路径”中,然后只需提供文件名即可尝试访问。由于该文件将自动放置在 /tmp 文件夹下。但是还是解决不了问题。

通过this website,我了解到我们还需要提供所有默认值:

下面是我的最终代码:

val spark: SparkSession = SparkSession.builder()
    .config("spark.cassandra.connection.host","server.abc")
    .config("spark.cassandra.connection.port","9142")
    .config("spark.cassandra.connection.ssl.enabled",true)
    .config("spark.cassandra.connection.ssl.enabledAlgorithms", "TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA")
    .config("spark.cassandra.connection.ssl.trustStore.path","trust.jks")
    .config("spark.cassandra.connection.ssl.trustStore.password","mypass")
    .config("spark.cassandra.connection.ssl.trustStore.type","JKS")
    .config("spark.cassandra.connection.ssl.protocol","TLS")
    .config("spark.cassandra.auth.username","myuser")
    .config("spark.cassandra.auth.password","userpass")
    .appName("CassandraIntegration").getOrCreate()

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-08-23
    • 1970-01-01
    • 2020-09-13
    • 2022-10-18
    • 1970-01-01
    • 2018-09-16
    • 2021-11-26
    • 1970-01-01
    相关资源
    最近更新 更多