【问题标题】:Error with DVC on Google Colab - dvc.scm.CloneError: Failed to clone repoError with DVC on Google Colab - dvc.scm.CloneError: Failed to clone repo
【发布时间】:2022-12-27 22:15:12
【问题描述】:

I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.

I've successfully pushed my dataset to repository B with DVC (using gdrive as my remote storage) and I also managed to successfully run "dvc import" (as well as "dvc pull/update") on my local project of repository A.

The problem comes when I use colab to run my project. So what I did was the following:

  1. Created a new notebook on colab
  2. Successfully git-cloned my machine learning project (repository A)
  3. Ran "!pip install dvc"
  4. Ran "!dvc pull -v" (This is what causes the error)

    On step 4, I got the error (this is the full stack trace. Note that I changed the repo URL in the stack trace for confidentiality reasons)

    2022-03-08 08:53:31,863 DEBUG: Adding '/content/<my_project_A>/.dvc/config.local' to gitignore file.
    2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/tmp' to gitignore file.
    2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/cache' to gitignore file.
    2022-03-08 08:53:31,916 DEBUG: Creating external repo https://gitlab.com/<my-dataset-repo-B>.git@3a3f2019efabff8ec71429da39b86688d1c98e75
    2022-03-08 08:53:31,916 DEBUG: erepo: git clone 'https://gitlab.com/<my-dataset-repo-B>.git' to a temporary dir
    Everything is up to date.
    2022-03-08 08:53:32,154 ERROR: failed to pull data from the cloud - Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
    ------------------------------------------------------------
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 185, in clone
        tmp_repo = clone_from()
      File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1148, in clone_from
        return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1079, in _clone
        finalize_process, decode_streams=False)
      File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 176, in handle_process_output
        return finalizer(process)
      File "/usr/local/lib/python3.7/dist-packages/git/util.py", line 386, in finalize_process
        proc.wait(**kwargs)
      File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 502, in wait
        raise GitCommandError(remove_password_if_present(self.args), status, errstr)
    git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
      cmdline: git clone -v --no-single-branch --progress https://gitlab.com/<my-dataset-repo-B>.git /tmp/tmp2x6y9z0edvc-clone
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 104, in clone
        return Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/__init__.py", line 121, in clone
        backend.clone(url, to_path, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 190, in clone
        raise CloneError(url, to_path) from exc
    scmrepo.exceptions.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/dvc/command/data_sync.py", line 41, in run
        glob=self.args.glob,
      File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
        return f(repo, *args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/dvc/repo/pull.py", line 38, in pull
        run_cache=run_cache,
      File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
        return f(repo, *args, **kwargs)
      File "/usr/local/lib/python3.7/dist-packages/dvc/repo/fetch.py", line 50, in fetch
        revs=revs,
      File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 437, in used_objs
        with_deps=with_deps,
      File "/usr/local/lib/python3.7/dist-packages/dvc/repo/index.py", line 190, in used_objs
        filter_info=filter_info,
      File "/usr/local/lib/python3.7/dist-packages/dvc/stage/__init__.py", line 660, in get_used_objs
        for odb, objs in out.get_used_objs(*args, **kwargs).items():
      File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 918, in get_used_objs
        return self.get_used_external(**kwargs)
      File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 973, in get_used_external
        return dep.get_used_objs(**kwargs)
      File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 94, in get_used_objs
        used, _ = self._get_used_and_obj(**kwargs)
      File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 108, in _get_used_and_obj
        locked=locked, cache_dir=local_odb.cache_dir
      File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
        return next(self.gen)
      File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 35, in external_repo
        path = _cached_clone(url, rev, for_write=for_write)
      File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 155, in _cached_clone
        clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
      File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 45, in wrapper
        return deco(call, *dargs, **dkwargs)
      File "/usr/local/lib/python3.7/dist-packages/funcy/flow.py", line 274, in wrap_with
        return call()
      File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 66, in __call__
        return self._func(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 220, in _clone_default_branch
        git = clone(url, clone_path)
      File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 106, in clone
        raise CloneError(str(exc))
    dvc.scm.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
    ------------------------------------------------------------
    2022-03-08 08:53:32,161 DEBUG: Analytics is enabled.
    2022-03-08 08:53:32,192 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp4x5js0dk']'
    2022-03-08 08:53:32,193 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp6x11s0dk']'
    

    And btw this is how I cloned my git repository (repo A)

    !git config - global user.name "Zharfan"
    !git config - global user.email "zharfan@myemail.com"
    !git clone https://<MyTokenName>:<MyToken>@link-to-my-repo-A.git
    

    Does anyone know why? Any help would be greatly appreciated. Thank you in advance!

【问题讨论】:

  • What version of DVC are you using? What system do you use?
  • I use DVC 2.9.2 on my local PC and it runs on Windows. However, on Google Colab (the environment that I faced the error on), I use DVC 2.9.5 @don_pablito
  • okay, to clarify - link-to-my-repo.git in the ERROR message - is it repo A or repo B, could you share the full dvc pull -v stack trace please, is it just a generic CloneError?
  • I think the problem here is that DVC doesn't have access to gitlab private repo. When you was doing dvc import what URL did you specify - https? git?
  • I see. Hmm that makes sense since I didn't store my Gitlab token anywhere (I only passed it along in the URL when cloning repo A) which means DVC wouldn't have the access to my token. My PC on the other hand does store my Gitlab access token. You've given me a very helpful clue. I might have an idea on how to solve it. Thanks! And btw I use https. The URL looks something like this: gitlab.com/u/my-repo-b.git @Shcheklein

标签: git dataset google-colaboratory dvc


【解决方案1】:

To summarize the discussion in the cmets thread.

Most likely it's happening since DVC can't get access to a private repo on GitLab. (The error message is obscure and should be fixed.)

The same way you would not be able to run:

!git clone https://gitlab.com/org/<private-repo>

It also returns a pretty obscure error:

Cloning into '<private-repo>'...
fatal: could not read Username for 'https://gitlab.com': No such device or address

(I think it's something related to how tty is setup in Colab?)

The best approach to solve this is to use SSH like described here for example.

【讨论】:

  • Using SSH did solve the issue, though I had to re-run "dvc import" again since the original .dvc file would pull using the https URL. Anyway, thanks a lot! :)
  • Update: Just found out I could just change the URL in the .dvc file so I wouldn't actually have to re-import the dataset repo haha I can't believe I missed out on that
猜你喜欢
  • 2023-03-10
  • 1970-01-01
  • 2011-11-22
  • 2013-05-29
  • 1970-01-01
  • 2020-02-06
  • 2021-08-14
  • 2019-06-23
  • 2021-12-27
相关资源
最近更新 更多