如何计算zip文件中csv文件的行数答案

【问题标题】：How to count number of rows of csv file within zip file如何计算zip文件中csv文件的行数
【发布时间】：2023-09-13 21:36:01
【问题描述】：

试图通过使用 fread() 将 csv 文件直接从 zip 加载到 R 中来节省磁盘空间。只是想知道是否有一种方法可以在加载之前从 csv（在 zip 中）获取类似于 nrow() 或 dim() 的内容，以便了解对象的大小并避免用完可用的 ram。有什么建议么？如果有更好的方法来确定 csv 在未压缩并加载到 R 时是否会太大，那也很高兴知道。谢谢（ps 使用 Windows 10）。

【问题讨论】：

r-bloggers.com/…
您也可以在 CMD 中运行 unzip -l <path>，它会列出包含的文件以及未压缩的总大小。
基本上是shell(shQuote(sprintf("unzip -l %s", file.choose()))
Extract bz2 file in R的可能重复
这不是那个问题的重复，因为 macsmith 正在询问如何有效地进行大小/行计数。该问题仅说明了如何直接读取数据并与之交互。

标签： r zip

【解决方案1】：

vroom 是一个非常好的选择，尤其是在快速读取压缩文件方面：

https://vroom.r-lib.org: “...它只是索引每条记录所在的位置，以便以后读取。”所以加载非常大的数据集应该是安全的，而不会有陷入锁定的风险。

require(vroom)

vroom("./data.csv.gz")
# indexed 0B in  0s, 0B/sindexed 1.00TB in  0s, 1.25PB/sRows: 200                 
# Columns: 6
# Delimiter: ","
# chr [6]: Column1, Date, Column2, Subtable_Column1, Subtable_Column2, Subtable_Column3
# 
#
# Use `spec()` to retrieve the guessed column specification
# Pass a specification to the `col_types` argument to quiet this message
# A tibble: 200 x 6
... <data> ...

【讨论】：