用 Python 代码（程序 2）抓取 EDGAR 不起作用答案

【问题标题】：Scraping EDGAR with Python codes (Program 2) not working用 Python 代码（程序 2）抓取 EDGAR 不起作用
【发布时间】：2026-02-20 08:10:01
【问题描述】：

我尝试了 Rasha Ashraf 文章“Scraping EDGAR with Python”中的 Python 代码。昨天我从你伟大的开发者那里得到了帮助。特别感谢 Jack Fleeting。这个问题相关的链接如下：

Text Scraping (from EDGAR 10K Amazon) code not working

word count from web text document result in 0

这是上面同一篇文章中的第二个 Python 程序，但我想，由于 Python 版本不同，它仍然无法运行。

我的问题是我遇到了名为“TypeError: a bytes-like object is required, not 'str'”的初始错误。我搜索了 * 并应用了一种方法和另一种方法。但是，一旦一条错误消息消失，就会发生其他错误。在我即兴进行多次代码更改后，“print(element4)”的结果显示为“None”。这不是作者想要的结果。

我为纠正原始代码而进行的微不足道的尝试被证明是行不通的。因此，在这里我上传了原始代码和第一条错误消息。一旦你帮我解决了最初的错误信息，那么我将继续解决第二个、第三个等等。

我通常使用 Python 处理 CSV 文件格式的数值变量和分类变量。因此，从某种意义上说，这个网络抓取 Python 程序（尤其是处理和收集 URL）目前超出了我的能力范围。请帮助我获得“无”以外的“元素4”的结果。这样我就可以得到2013年亚马逊（10-K）备案的正确路径了。

import time

import csv

import sys

CIK = '1018724'

Year= '2013'

FILE= '10-K'


# Get the Master Index File for the given Year

url='https://www.sec.gov/Archives/edgar/full-index/%s/QTR1/master.idx'%(Year)

from urllib.request import urlopen

response= urlopen(url)

string_match1= 'edgar/data/'

element2 = None

element3 = None

element4 = None

# Go through each line of the master index file and find given CIK # and File (10-K)

# and extract the text file path

for line in response:

    if CIK in line and FILE in line:

        for element in line.split(' '):

            if string_match1 in element:

                element2 = element.split('|')

                for element3 in element2:

                    if string_match1 in element3:

                        element4 = element3
                        
print(element4)

### The path of the 10-K filing

url3 = 'https://www.sec.gov/Archives/'+element4

--- 错误信息 ---

TypeError                                 Traceback (most recent call last)

<ipython-input-25-8b7ded22bf96> in <module>

     25
 
     26 for line in response:

---> 27     if CIK in line and FILE in line:

     28         for element in line.split(' '):

     29             if string_match1 in element:


TypeError: a bytes-like object is required, not 'str'

【问题讨论】：

标签： python-3.x url scrape edgar sec

【解决方案1】：

我相信这就是您正在寻找的：

import requests
import csv

CIK = '1018724'
Year= '2013'
FILE= '10-K'
url='https://www.sec.gov/Archives/edgar/full-index/%s/QTR1/master.idx'%(Year)

req = requests.get(url)
targets = csv.reader(req.text.splitlines(), delimiter='|')
for line in targets:
    if CIK in line and FILE in line:
        print("https://www.sec.gov/Archives/"+line[-1])

输出：

https://www.sec.gov/Archives/edgar/data/1018724/0001193125-13-028520.txt

【讨论】：

谢谢，杰克。我正在做一些紧急的其他事情。现在我回到这个问题。由于我在python方面缺乏技能，在这个感恩节假期我将不得不尝试所有方法，包括你的方法。因为原来的代码一一导致了这么多错误，我不得不测试每一个可能的步骤。我会在这里报告我的结果。希望很快。