需要帮助在 Google Colab 上提取 Google 卡通数据集答案

【问题标题】：Need help extracting Google Cartoon Dataset on Google Colab需要帮助在 Google Colab 上提取 Google 卡通数据集
【发布时间】：2021-03-17 06:39:04
【问题描述】：

我希望在 https://google.github.io/cartoonset/download.html 上提供 Google 卡通数据集。我会使用 Googel Colab 来处理分类任务，但那是以后的事了。现在的挑战是 1. 如何将数据直接发送到 Colab 或 Google Drive，我确实使用过 -

! wget --no-check-certificate \
    https://storage.cloud.google.com/cartoonset_public_files/cartoonset100k.tgz

这只会让我在 colab 中得到一个 60kb 的小文件，然后第二部分是如何直接在 colab 临时文件中提取子文件夹。我试过了

import shutil
shutil.unpack_archive("cartoonset10k.tgz", "/tmp/")

with tarfile.open('cartoonset10k.tgz', 'r:gz') as tar:
    tar.extractall()

错误 - 读取错误：不是 gzip 文件

!tar -xzf cartoonset10k.tgz -C ~/tmp/

gzip：标准输入：不是 gzip 格式焦油：孩子返回状态1 tar：错误不可恢复：现在退出

我可以将数据下载到系统并解压缩它，但挑战是再次将其上传到 colab，这在我拥有的互联网上需要很长时间。

【问题讨论】：

标签： google-colaboratory

【解决方案1】：

Google 要求您在通过 http 下载之前先登录。 60k 文件实际上是一个要求您登录的 HTML 页面，而不是数据本身。

使用 wget 或 curl 或 selenium 登录很困难。

幸运的是，您可以使用gsutil 直接下载而无需验证。

!gsutil cp gs://cartoonset_public_files/cartoonset100k.tgz .

【讨论】：

谢谢您，我尝试了您的命令，但出现错误“CommandException：“cp”命令的参数数量错误。”
你忘记了最后的点。