【发布时间】:2020-11-02 02:39:32
【问题描述】:
我有这段代码,我从 UDF 返回一个整数类型,但系统将其更改为字符串。
我该如何纠正这个问题?
# Define a UDF to determine the number of pixels per image
def dogPixelCount(doglist):
totalpixels = 0
for dog in doglist:
totalpixels += (dog[3] - dog[1]) * (dog[4] - dog[2])
return totalpixels
# Define a UDF for the pixel count
udfDogPixelCount = F.udf(dogPixelCount, IntegerType())
joined_df = joined_df.withColumn('dog_pixels', udfDogPixelCount('dogs'))
# Create a column representing the percentage of pixels
joined_df = joined_df.withColumn('dog_percent', ('dog_pixels' / sum('dog_pixels') ) * 100 )
# Show the first 10 annotations with more than 60% dog
joined_df.filter(dog_percent > 60).show(10)
【问题讨论】:
-
在这里发帖时请不要大喊大叫。全部大写的文本更难阅读和理解,并且不会帮助您更快地获得答案。当您要求我们提供免费帮助时,对我们大喊大叫也是相当不礼貌的。谢谢。
标签: dataframe pyspark user-defined-functions