【问题标题】:How to prevent Kaggle re-downloading model files each time session is ended and restarted?每次会话结束并重新启动时,如何防止 Kaggle 重新下载模型文件?
【发布时间】:2023-02-06 00:10:19
【问题描述】:

我想将下载的模型数据保存在 kaggle 笔记本中

这是我的 kaggle notebook 的例子:https://www.kaggle.com/furkangozukara/tglobal-xl-booksum-wip3r3

每当会话结束并重新启动时,它都会从 huggingface 重新下载所有模型数据

例如下图显示了从导入的存储库下载的模型数据:https://huggingface.co/pszemraj/long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP/tree/main

【问题讨论】:

    标签: kaggle


    【解决方案1】:

    您可以使用 /kaggle/working 目录,这是 Kaggle 环境中的持久存储位置。将您的模型文件保存在那里,它们将在整个会话中持续存在。

    节省:

    model = # download from huggingface the 1st time #
    tokenizer = # download from huggingface the 1st time #
    ...
    import os, shutil
    model_path = os.path.join('/kaggle/working', "YOUR_MODEL_NAME")
    if os.path.exists(model_path): shutil.rmtree(model_path)
    os.mkdir(model_path)
    model.save_pretrained(model_path)
    tokenizer.save_pretrained(model_path)
    

    用法:

    AutoModelForSeq2SeqLM.from_pretraiend(model_path)
    AutoTokenizer.from_pretraiend(model_path)
    

    【讨论】: