【发布时间】:2019-10-31 12:19:48
【问题描述】:
我基本上是在尝试将行从一个 DF 更新/添加到另一个。这是我的代码:
# S3
import boto3
# SOURCE
source_table = "someDynamoDbtable"
source_s3 = "s://mybucket/folder/"
# DESTINATION
destination_bucket = "s3://destination-bucket"
#Select which attributes to update/add
params = ['attributeD', 'attributeF', 'AttributeG']
#spark wrapper
glueContext = GlueContext(SparkContext.getOrCreate())
newData = glueContext.create_dynamic_frame.from_options(connection_type = "dynamodb", connection_options = {"tableName": source_table})
newValues = newData.select_fields(params)
newDF = newValues.toDF()
oldData = glueContext.create_dynamic_frame.from_options(connection_type="s3", connection_options={"paths": [source_s3]}, format="orc", format_options={}, transformation_ctx="dynamic_frame")
oldDataValues = oldData.drop_fields(params)
oldDF = oldDataValues.toDF()
#makes a union of the dataframes
rebuildData = oldDF.union(newDF)
#error happens here
readyData = DynamicFrame.fromDF(rebuildData, glueContext, "readyData")
#writes new data to s3 destination, into orc files, while partitioning
glueContext.write_dynamic_frame.from_options(frame = readyData, connection_type = "s3", connection_options = {"path": destination_bucket}, format = "orc", partitionBy=['partition_year', 'partition_month', 'partition_day'])
我得到的错误是:
SyntaxError:readyData = ... 行上的语法无效
到目前为止,我不知道出了什么问题。
【问题讨论】:
-
你确定
rebuildData = oldDF.union(newData)有效吗?
标签: python amazon-web-services dataframe pyspark aws-glue