如何读取 Github 存储库中的所有文本文件？答案

【问题标题】：How to read all text files in Github repository?如何读取 Github 存储库中的所有文本文件？
【发布时间】：2019-10-05 04:25:54
【问题描述】：

我想读取 Github 存储库中的所有文本文件，但是文本文件地址与原始文本地址不同。 Trump Speeches

例如看这个链接： speech_00.txt in first status

现在，speech_00.txt 在原始模式下具有不同的地址 speech_00.txt in raw status

如何在不编辑地址的情况下处理这个问题（例如添加 githubusercontent 或删除 blob)

另外，我使用以下代码阅读了一个示例文本文件：

import urllib
response = urllib.request.urlopen("https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/speech_72.txt")
Text = response.read()
Text=Text.decode("utf-8")

【问题讨论】：

标签： python github repository

【解决方案1】：

实现这一点的一种 hacky 方式（特别是基于该目录的方式结构化）将是使循环迭代地添加到您输入的字符串中作为您的文件路径：

import urllib

# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 00
end_speech = 73
while (cur_speech <= end_speech):
    # Change the speech you want to get
    speech_nm = ('speech_' + str(cur_speech) +'.txt')
    response = urllib.request.urlopen(speech_nm)
    # Do what you need to with the speech
    Text = response.read()
    Text = Text.decode("utf-8")
    # Update to the new speech
    cur_speech +=1

这样，您将浏览该特定目录中的每个演讲。

【讨论】：

我在 Colab 中测试了您的解决方案，但它有一些错误。如果我想使用这种方法，我可以通过处理每个文件名称的字符串来编写它，并且不会出现任何错误。但是你使用的是字符串修改方法。

【解决方案2】：

我使用您的代码 (@N.Yasarturk)，并对其进行了编辑以获取所有文件。但是我问，还有其他方法（无需编辑地址）可以从 Github 存储库中读取这些文件吗？

import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 0
temp=str(cur_speech)
end_speech = 73
while (cur_speech <= end_speech):
    # Change the speech you want to get
    if(cur_speech<10):
        temp="0"+str(cur_speech)
    else:
        temp=str(cur_speech)
    speech_nm = (speech_dir+'speech_' + temp +'.txt')
    print(speech_nm)
    response = urllib.request.urlopen(speech_nm)
    # Do what you need to with the speech
    Text = response.read()
    Text = Text.decode("utf-8")
    print(Text)
    # Update to the new speech
    cur_speech +=1

【讨论】：