【问题标题】:How to determine which forks on GitHub are ahead?如何判断 GitHub 上哪些分叉领先?
【发布时间】:2019-07-19 00:30:47
【问题描述】:

有时,我正在使用的一个软件的原始 GitHub 存储库,例如 linkchecker,几乎没有开发或没有开发,而已经创建了很多分支(在这种情况下:142,当时写作)。

对于每个分叉,我想知道:

  • 在原始主分支之前提交了哪些分支

对于每个这样的分支:

  • 它比原始提交提前了多少次
  • 它落后了多少次提交

GitHub has a web interface for comparing forks,但我不想为每个分叉手动执行此操作,我只想要一个包含所有分叉结果的 CSV 文件。如何编写脚本? The GitHub API can list the forks,但我看不出如何将分叉与它进行比较。依次克隆每个分叉并在本地进行比较似乎有点粗略。

【问题讨论】:

  • ++,但请注意,这种方法至少存在一个问题……分叉可能会与原始存储库大相径庭,其方式可能是好的和/或坏的,所以知道哪个分支有更多的提交并不一定表明哪个分支“领先”于原始分支。
  • 我正在寻找一种快速的方法来选择值得更仔细检查的前叉。如果你有更好的主意,我会全力以赴!
  • 相关,实际上可能是重复的:Github, forked repositories ahead of master: active users.
  • !我不知道那个功能。我不认为这个问题是重复的(我仍然想要我要问的)但它绝对有帮助,谢谢!

标签: github git-fork


【解决方案1】:

点击顶部的“Insights”,然后点击左侧的“Forks”后,以下书签将信息直接打印到网页上,如下所示:

作为书签添加(或粘贴到控制台)的代码:

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const aTags = [...document.querySelectorAll('div.repo a:last-of-type')].slice(1);

  for (const aTag of aTags) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it directly onto the web page */
    await fetch(aTag.href)
      .then(x => x.text())
      .then(html => aTag.outerHTML += `${html.match(/This branch is.*/).pop().replace('This branch is', '').replace(/([0-9]+ commits? ahead)/, '<font color="#0c0">$1</font>').replace(/([0-9]+ commits? behind)/, '<font color="red">$1</font>')}`)
      .catch(console.error);
  }
})();

您也可以将代码粘贴到地址栏中,但请注意,某些浏览器会在粘贴时删除前导 javascript:,因此您必须自己输入 javascript:

已从this answer修改。


奖金

以下小书签还打印 ZIP 文件的链接:

作为书签添加(或粘贴到控制台)的代码:

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const aTags = [...document.querySelectorAll('div.repo a:last-of-type')].slice(1);

  for (const aTag of aTags) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it directly onto the web page */
    await fetch(aTag.href)
      .then(x => x.text())
      .then(html => aTag.outerHTML += `${html.match(/This branch is.*/).pop().replace('This branch is', '').replace(/([0-9]+ commits? ahead)/, '<font color="#0c0">$1</font>').replace(/([0-9]+ commits? behind)/, '<font color="red">$1</font>')}` + " <a " + `${html.match(/href="[^"]*\.zip">/).pop() + "Download ZIP</a>"}`)
      .catch(console.error);
  }
})();

【讨论】:

  • 用火狐测试过;对我有用,而且看起来也不错(我们可以关注它的进展)。
  • 我必须说:the first page I tried it on 上的结果表明 GiutHub 应该让玩具叉子和陈旧的叉子更容易被发现。
  • 爱它 - 我在偷这个 :)
【解决方案2】:

有完全相同的痒,并编写了一个刮板,它将打印在渲染 HTML 中的信息用于分叉:https://github.com/hbbio/forkizard

绝对不是完美的,只是一个临时的解决方案。

【讨论】:

  • 据我所知,GitHub 似乎仍然没有显示此信息,对吗?例如。 repo github.com/alormil/ipa-rest-api/network/members 或者目前有其他方法可以做到这一点,否则你的倡议听起来很棒,我会使用它!
【解决方案3】:

派对迟到了 - 我想这是我第二次写这篇 SO 帖子了,所以我将分享我的基于 js 的解决方案(我最终通过获取和搜索 html 页面制作了一个书签) . 您可以从中创建一个bookmarklet,或者简单地将整个内容粘贴到控制台中。适用于基于铬和火狐:

编辑:如果页面上有超过 10 个左右的分叉,您可能会因为抓取速度太快而被锁定(网络中的请求太多 429)。改用 async / await:

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);

  for (const fork of forks) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it to console */
    await fetch(fork)
      .then(x => x.text())
      .then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
      .catch(console.error);
  }
})();

或者你可以批量处理,但是很容易被锁定

javascript:(async () => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);

  getfork = (fork) => {
    return fetch(fork)
      .then(x => x.text())
      .then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
      .catch(console.error);
  }

  while (forks.length) {
    await Promise.all(forks.splice(0, 2).map(getfork));
  }
})();

原始(这会立即触发所有请求,如果请求数超过 github 允许的数量,可能会将您锁定)

javascript:(() => {
  /* while on the forks page, collect all the hrefs and pop off the first one (original repo) */
  const forks = [...document.querySelectorAll('div.repo a:last-of-type')].map(x => x.href).slice(1);

  for (const fork of forks) {
    /* fetch the forked repo as html, search for the "This branch is [n commits ahead,] [m commits behind]", print it to console */
    fetch(fork)
      .then(x => x.text())
      .then(html => console.log(`${fork}: ${html.match(/This branch is.*/).pop().replace('This branch is ', '')}`))
      .catch(console.error);
  }
})();

将打印如下内容:

https://github.com/user1/repo: 289 commits behind original:master.
https://github.com/user2/repo: 489 commits behind original:master.
https://github.com/user2/repo: 1 commit ahead, 501 commits behind original:master.
...

到控制台。

编辑:用块 cmets 替换 cmets 以实现粘贴能力

【讨论】:

  • 这应该如何工作?在 Firefox 和 Chrome 中,我只能将其粘贴为一行,当我单击 the forks page for rclone 上的结果书签时,它不会显示任何结果。但是,当我在 Chrome 中重新加载该页面时,该页面显示“访问已受到限制 - 您已触发滥用检测机制。”所以某事一定发生了。
  • 啊,是的,那个有很多叉子。我假设您刚刚受到了速率限制。我将使用节流版本更新描述——我测试过的版本只有大约 20 个分叉
  • @reinierpost 还请注意 - 作为一个书签,它只是便于点击 - 您仍然需要打开控制台才能查看提取结果。发生的事情是您的速率受到限制,并且由于重新加载只是发出另一个请求,因此也被阻止了。我已经更新了描述,而是将它们 1 逐 1 刮掉
  • 嗯,console.log,我应该检查一下。它现在正在工作!我的 Firefox 控制台现在有一个很长的列表或行,例如 https://github.com/adragomir/rclone: 1 commit ahead, 4251 commits behind rclone:master. ).pop().replace('This branch is ', '')})) .catch(console.error); } })();:1:442`
  • 啊,我明白了。在 Firefox 上,使用 marklet 时,控制台会使用脚本打印行号。使用深色主题,左侧是灰色文本(控制台输出),右侧是蓝色文本(控制台日志所在的行号)。如果您将其粘贴到控制台而不是作为标记,它将显示“调试器评估代码” - 它正在工作,它看起来很有趣
【解决方案4】:

active-forks 并不能完全满足我的要求,但它很接近并且非常易于使用。

【讨论】:

    【解决方案5】:

    这是一个 Python 脚本,用于列出和克隆前面的分叉。此脚本部分使用 API,因此会触发速率限制(您可以通过在脚本中添加 GitHub API authentication 来扩展速率限制(不是无限),请编辑或发布)。

    最初我尝试完全使用 API,但触发速率限制太快,所以现在我使用 is_fork_ahead_HTML 而不是 is_fork_ahead_API。如果 GitHub 网站设计发生变化,这可能需要进行调整。

    由于速率限制,我更喜欢我在此处发布的其他答案。

    import requests, json, os, re
    
    def obj_from_json_from_url(url):
        # TODO handle internet being off and stuff
        text = requests.get(url).content
        obj = json.loads(text)
        return obj, text
    
    def is_fork_ahead_API(fork, default_branch_of_parent):
        """ Use the GitHub API to check whether `fork` is ahead.
         This triggers the rate limit, so prefer the non-API version below instead.
        """
        # Compare default branch of original repo with default branch of fork.
        comparison, comparison_json = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo+'/compare/'+default_branch_of_parent+'...'+fork['owner']['login']+':'+fork['default_branch'])
        if comparison['ahead_by']>0:
            return comparison_json
        else:
            return False
    
    def is_fork_ahead_HTML(fork):
        """ Use the GitHub website to check whether `fork` is ahead.
        """
        htm = requests.get(fork['html_url']).content
        match = re.search('<div class="d-flex flex-auto">[^<]*?([0-9]+ commits? ahead(, [0-9]+ commits? behind)?)', htm)
        # TODO if website design changes, fallback onto checking whether 'ahead'/'behind'/'even with' appear only once on the entire page - in that case they are not part of the username etc.
        if match:
            return match.group(1) # for example '1 commit ahead, 114 commits behind'
        else:
            return False
    
    def clone_ahead_forks(user,repo):
        obj, _ = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo)
        default_branch_of_parent = obj["default_branch"]
        
        page = 0
        forks = None
        while forks != [{}]:
            page += 1
            forks, _ = obj_from_json_from_url('https://api.github.com/repos/'+user+'/'+repo+'/forks?per_page=100&page='+str(page))
    
            for fork in forks:
                aheadness = is_fork_ahead_HTML(fork)
                if aheadness:
                    #dir = fork['owner']['login']+' ('+str(comparison['ahead_by'])+' commits ahead, '+str(comparison['behind_by'])+'commits behind)'
                    dir = fork['owner']['login']+' ('+aheadness+')'
                    print dir
                    os.mkdir(dir)
                    os.chdir(dir)
                    os.system('git clone '+fork['clone_url'])
                    print
                    
                    # recurse into forks of forks
                    if fork['forks_count']>0:
                        clone_ahead_forks(fork['owner']['login'], fork['name'])
                        
                    os.chdir('..')
    
    user = 'cifkao'
    repo = 'tonnetz-viz'
    
    clone_ahead_forks(user,repo)
    

    【讨论】:

      【解决方案6】:

      这是一个 Python 脚本,用于列出和克隆所有前面的分支。

      它不使用 API。所以它不会受到速率限制,也不需要身份验证。但如果 GitHub 网站设计发生变化,可能需要调整。

      与其他答案中显示 ZIP 文件链接的小书签不同,此脚本还保存有关提交的信息,因为它使用 git clone 并创建一个带有概述的 commits.htm 文件。

      import requests, re, os, sys, time
      
      def content_from_url(url):
          # TODO handle internet being off and stuff
          text = requests.get(url).content
          return text
      
      def clone_ahead_forks(forklist_url):
          forklist_htm = content_from_url(forklist_url)
          with open("forklist.htm", "w") as text_file:
              text_file.write(forklist_htm)
              
          is_root = True
          # not working if there are no forks: '<a class="(Link--secondary)?" href="(/([^/"]*)/[^/"]*)">'
          for match in re.finditer('<a (class=""|data-pjax="#js-repo-pjax-container") href="(/([^/"]*)/[^/"]*)">', forklist_htm):
              fork_url = 'https://github.com'+match.group(2)
              fork_owner_login = match.group(3)
              fork_htm = content_from_url(fork_url)
              
              match2 = re.search('<div class="d-flex flex-auto">[^<]*?([0-9]+ commits? ahead(, [0-9]+ commits? behind)?)', fork_htm)
              # TODO if website design changes, fallback onto checking whether 'ahead'/'behind'/'even with' appear only once on the entire page - in that case they are not part of the username etc.
              
              sys.stdout.write('.')
              if match2 or is_root:
                  if match2:
                      aheadness = match2.group(1) # for example '1 commit ahead, 2 commits behind'
                  else:
                      aheadness = 'root repo'
                      is_root = False # for subsequent iterations
                      
                  dir = fork_owner_login+' ('+aheadness+')'
                  print dir
                  
                  os.mkdir(dir)
                  os.chdir(dir)
                  
                  # save commits.htm
                  commits_htm = content_from_url(fork_url+'/commits')            
                  with open("commits.htm", "w") as text_file:
                      text_file.write(commits_htm)
                  
                  # git clone
                  os.system('git clone '+fork_url+'.git')
                  print
                  
                  # no need to recurse into forks of forks because they are all listed on the initial page and being traversed already
                      
                  os.chdir('..')
      
          
      
      
      base_path = os.getcwd()
      match_disk_letter = re.search(r'^([a-zA-Z]:\\)', base_path)
      
      
      with open('repo_urls.txt') as url_file:
          for url in url_file:
              url = url.strip()
              match = re.search('github.com/([^/]*)/([^/]*)$', url)
              if match:
                  user_name = match.group(1)
                  repo_name = match.group(2)
                  print repo_name
                  dirname_for_forks = repo_name+' ('+user_name+')'
                  if not os.path.exists(dirname_for_forks):
                      url += "/network/members" # page that lists the forks
      
                      TMP_DIR = 'tmp_'+time.strftime("%Y%m%d-%H%M%S")
                      if match_disk_letter: # if Windows, i.e. if path starts with A:\ or so, run git in A:\tmp_... instead of .\tmp_..., in order to prevent "filename too long" errors
                          TMP_DIR = match_disk_letter.group(1)+TMP_DIR
                      print TMP_DIR
      
                      os.mkdir(TMP_DIR)
                      os.chdir(TMP_DIR)
                      clone_ahead_forks(url)
                      print
                      os.chdir(base_path)
                      os.rename(TMP_DIR, dirname_for_forks)
                  else:
                      print dirname_for_forks+' already exists, skipping.'
      
      

      如果你将文件repo_urls.txt制作成如下内容(你可以放几个网址,每行一个网址):

      https://github.com/cifkao/tonnetz-viz
      

      然后您将获得以下目录,每个目录都包含相应的克隆存储库:

      tonnetz-viz (cifkao)
        bakaiadam (2 commits ahead)
        chumo (2 commits ahead, 4 commits behind)
        cifkao (root repo)
        codedot (76 commits ahead, 27 commits behind)
        k-hatano (41 commits ahead)
        shimafuri (11 commits ahead, 8 commits behind)
      

      如果不起作用,请尝试earlier versions

      【讨论】:

      • 我想我们应该按照here 的描述将--mirror 标志添加到git clone,对吧?
      【解决方案7】:

      这是一个使用 Github API 的 Python 脚本。我想包括日期和最后一次提交消息。如果您需要达到 5k 个请求/小时,则需要包含个人访问令牌 (PAT)。

      用法:python3 list-forks.py https://github.com/itinance/react-native-fs

      import requests, re, os, sys, time, json, datetime
      from urllib.parse import urlparse
      
      GITHUB_PAT = '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'
      
      def json_from_url(url):
          response = requests.get(url, headers={ 'Authorization': 'token {}'.format(GITHUB_PAT) })
          return response.json()
      
      def iso8601_date_to_text(date):
          return datetime.datetime.strptime(date, '%Y-%m-%dT%H:%M:%SZ').strftime('%Y-%m-%d')
      
      def process_repo(repo_author, repo_name, fork_of_fork):
          page = 1
      
          while 1:
              forks_url = 'https://api.github.com/repos/{}/{}/forks?per_page=100&page={}'.format(repo_author, repo_name, page)
              forks_json = json_from_url(forks_url)
      
              if not forks_json:
                  break
      
              for fork_info in forks_json:
                  fork_author = fork_info['owner']['login']
                  fork_name = fork_info['name']
                  forks_count = fork_info['forks_count']
                  fork_url = 'https://github.com/{}/{}'.format(fork_author, fork_name)
      
                  compare_url = 'https://api.github.com/repos/{}/{}/compare/master...{}:master'.format(repo_author, fork_name, fork_author)
                  compare_json = json_from_url(compare_url)
      
                  if 'status' in compare_json:
                      items = []
      
                      status = compare_json['status']
                      ahead_by = compare_json['ahead_by']
                      behind_by = compare_json['behind_by']
                      total_commits = compare_json['total_commits']
                      commits = compare_json['commits']
      
                      if fork_of_fork:
                          items.append('   ')
      
                      items.append(fork_url)
                      items.append(status)
      
                      if ahead_by != 0:
                          items.append('+{}'.format(ahead_by))
      
                      if behind_by != 0:
                          items.append('-{}'.format(behind_by))
      
                      if total_commits > 0:
                          last_commit = commits[total_commits-1];
                          commit = last_commit['commit']
                          author = commit['author']
                          items.append(iso8601_date_to_text(author['date']))
                          items.append('"{}"'.format(commit['message'].replace('\n', ' ')))
      
                      if ahead_by > 0:
                          print(' '.join(items))
      
                  if forks_count > 0:
                      process_repo(fork_author, fork_name, True)
      
              page += 1
      
      url_parsed = urlparse(sys.argv[1].strip())
      path_array = url_parsed.path.split('/')
      root_author = path_array[1]
      root_name = path_array[2]
      
      root_url = 'https://github.com/{}/{}'.format(root_author, root_name)
      commits_url = 'https://api.github.com/repos/{}/{}/commits/master'.format(root_author, root_name)
      commits_json = json_from_url(commits_url)
      commit = commits_json['commit']
      author = commit['author']
      print('{} root {} "{}"'.format(root_url, iso8601_date_to_text(author['date']), commit['message'].replace('\n', ' ')));
      
      process_repo(root_author, root_name, False)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2011-10-05
        • 2013-07-03
        • 2011-02-26
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-07-08
        • 2013-03-03
        相关资源
        最近更新 更多