【发布时间】:2020-09-08 07:04:59
【问题描述】:
目前,我有一个数据框。我想把它们分成几个独立的dataframe,然后依次处理。
像这样的 spark 数据名:
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
| id|data_identifier_method| start_time| end_time|time_interval| time| value|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:10|351232.92|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:10|351232.92|
| fd784213423f| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
| fd784213423f| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
| fd784213423f| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:10|351232.92|
| fd784213423f| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
| fd784213423f| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
| fd784213423f| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:10|351232.92|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
那我想把它分成四个数据框:
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
| id|data_identifier_method| start_time| end_time|time_interval| time| value|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:10|351232.92|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
| id|data_identifier_method| start_time| end_time|time_interval| time| value|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
| id|data_identifier_method| start_time| end_time|time_interval| time| value|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
| fd784213423f| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
| fd784213423f| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
| fd784213423f| algid1_set1_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:10|351232.92|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
| id|data_identifier_method| start_time| end_time|time_interval| time| value|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
| fd784213423f| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:00|342342.12|
| fd784213423f| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:05|342421.88|
| fd784213423f| algid2_set2_total...|20200903 00:00:00|20200903 00:00:10| 5|20200903 00:00:10|351232.92|
+--------------+----------------------+-----------------+-----------------+-------------+-----------------+---------+
我该怎么办?
也就是说,如果不分割原始dataframe,如何对原始dataframe中的这四项进行操作?
【问题讨论】:
-
用
filter怎么样? -
怎么做?
-
我建议您阅读有关基本数据帧操作的众多现有教程之一。比如这个:docs.databricks.com/spark/latest/dataframes-datasets/…
-
如果您仍然遇到问题,请返回此处并展示您尝试过的内容以及未按预期工作的内容。
标签: dataframe apache-spark apache-spark-sql