【发布时间】:2019-10-17 16:35:26
【问题描述】:
我正在尝试使用dask_cudf/dask 读取单个大parquet 文件(大小> gpu_size),但它目前正在将其读入单个分区,我猜这是从推断的预期行为文档字符串:
dask.dataframe.read_parquet(path, columns=None, filters=None, categories=None, index=None, storage_options=None, engine='auto', gather_statistics=None, **kwargs):
Read a Parquet file into a Dask DataFrame
This reads a directory of Parquet data into a Dask.dataframe, one file per partition.
It selects the index among the sorted columns if any exist.
是否有解决方法可以将其读入多个分区?
【问题讨论】: