【发布时间】:2019-11-08 14:00:50
【问题描述】:
我是 AWS 和胶水服务的新手,正在尝试使用 pycharm,并且我有一个 python 类可以从 S3 位置读取数据,它运行良好。 python 类连接到 DevEndPoint 并对其工作进行颂歌。 我想在我的笔记本电脑上创建类似的实践而不连接到 Devendpoint(就像我们使用 winutils 为 spark-hadoop 应用程序所做的那样),当我在 pycharm 上运行应用程序时,我遇到了以下错误:
C:\Users\****\PycharmProjects\AWSEndpoint\venv\Scripts\python.exe C:/Users/*****/PycharmProjects/AWSEndpoint/src/StackOver.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "C:/Users/*****/PycharmProjects/AWSEndpoint/src/StackOver.py", line 9, in <module>
glueContext = GlueContext(sc)
File "C:\Python\lib\awsglue\context.py", line 45, in __init__
self._glue_scala_context = self._get_glue_scala_context(**options)
File "C:\Python\lib\awsglue\context.py", line 66, in _get_glue_scala_context
return self._jvm.GlueContext(self._jsc.sc())
TypeError: 'JavaPackage' object is not callable
Process finished with exit code 1
下面是我正在使用的代码:
from py4j.java_gateway import java_import
from pyspark.sql.types import StructField, StructType, StringType, Row
from src.readConfig import read_config
from pyspark.sql.types import StringType
# spark = SparkSession.builder.appName('abc').getOrCreate()
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
newconf = sc._conf.set("spark.sql.catalogImplementation", "in-memory")
sc = sc.getOrCreate(newconf)
glueContext = GlueContext(SparkContext.getOrCreate())
def main():
# glueContext = GlueContext(SparkContext.getOrCreate())
logger = glueContext.get_logger()
logger.info("Job Started")
inputDf = glueContext.sparkSession.read.csv(input_dir)
print(inputDf.take(3))
if __name__ == "__main__":
main()
任何建议都会有所帮助,我尝试了所有 AWS 文档。
【问题讨论】:
-
GlueContext 很痛苦 :(
-
我遇到了同样的问题。我打开了 awslabs/aws-glue-libs 项目的问题 ~ github.com/awslabs/aws-glue-libs/issues/58
标签: apache-spark pyspark pycharm aws-glue