如何通过pyspark中的索引获取单元格值？答案

【问题标题】：how to get cell value by index in pyspark?如何通过pyspark中的索引获取单元格值？
【发布时间】：2021-03-21 12:42:28
【问题描述】：

我想获取单元格值以传递给 SQL 查询中的 where 条件。下面一个是我的数据框ab。数据框只有不同的值

+----------+--------+
|Months    |    YEAR|
+----------+--------+
|         3|    2018|
|         2|    2014|
+----------+--------+

基于这些我需要将值传递给 SQL 查询

for i in range(0,ab.count()):
      query = "select * from customer where YEAR= "+ab['YEAR'][i]+" and Months="+ab['Months'][i]
      df = sqlContext.read.format("jdbc").options(url="jdbc:mysql://localhost:3306/ohcdemo",driver="com.mysql.jdbc.Driver",query=query,user="root",password="root").load()

它应该通过编写 SQL 查询无法获取 SQL 查询的值来附加到数据帧

【问题讨论】：

标签： mysql sql python-3.x apache-spark pyspark

【解决方案1】：

Spark 数据帧没有排序，因此指定索引没有意义。您还在 for 循环的每次迭代中覆盖数据帧。

要做你想做的事，我建议做一个join。请注意，我在 JDBC 阅读器中更改了 query 参数。

df = sqlContext.read.format("jdbc").options(
    url="jdbc:mysql://localhost:3306/ohcdemo",
    driver="com.mysql.jdbc.Driver",
    query="select * from customer",
    user="root",
    password="root"
).load()

joined_df = ab.join(df, ['Months', 'YEAR'])

【讨论】：

此连接是否有任何替代方法无法按预期工作
@SS not work as expected 是什么意思？你遇到了什么错误？