【问题标题】:What is the equivalent of pandas.DataFrame.tail in DataBricks [closed]DataBricks中pandas.DataFrame.tail的等价物是什么[关闭]
【发布时间】:2019-01-14 15:29:08
【问题描述】:

DataBricks 中 pandas.DataFrame.tail 的等价物是什么?我在文档中搜索了一下,但没有找到任何相关功能。

【问题讨论】:

    标签: python pandas databricks


    【解决方案1】:

    DataBricks 显然使用的是 pyspark.sql 数据帧,而不是 pandas

    # Index the df if you haven't already
    # Note that monotonically increasing id has size limits
    from pyspark.sql.functions import monotonically_increasing_id
    df = df.withColumn("index", monotonically_increasing_id())
    
    # Query with the index
    tail = sqlContext.sql("""SELECT * FROM df ORDER BY index DESC limit 5""")
    tail.show()
    

    请注意,这很昂贵,并且无法发挥Spark 的优势。

    另见:

    https://medium.com/@chris_bour/6-differences-between-pandas-and-spark-dataframes-1380cec394d2

    pyspark,spark: how to select last row and also how to access pyspark dataframe by index

    【讨论】:

      猜你喜欢
      • 2013-12-16
      • 2016-07-04
      • 2012-09-06
      • 2012-09-12
      • 2010-11-16
      • 1970-01-01
      • 2014-02-24
      • 1970-01-01
      • 2021-10-30
      相关资源
      最近更新 更多