【发布时间】:2021-08-26 05:39:56
【问题描述】:
我正在尝试从网页中抓取一些文本并使用以下代码将它们保存在文本文件中(我正在从名为 links.txt 的文本文件中打开链接):
import requests
import csv
import random
import string
import re
from bs4 import BeautifulSoup
#Create random string of specific length
def randStr(chars = string.ascii_uppercase + string.digits, N=10):
return ''.join(random.choice(chars) for _ in range(N))
with open("links.txt", "r") as a_file:
for line in a_file:
stripped_line = line.strip()
endpoint = stripped_line
response = requests.get(endpoint)
data = response.text
soup = BeautifulSoup(data, "html.parser")
for pictags in soup.find_all('col-md-2'):
lastfilename = randStr()
file = open(lastfilename + ".txt", "w")
file.write(pictags.txt)
file.close()
print(stripped_line)
网页具有以下属性:
<div class="col-md-2">
问题是在运行代码后发生了注释,我没有收到任何错误。
【问题讨论】:
-
你想从那个页面抓取什么?你能解释一下吗
标签: python web-scraping beautifulsoup python-requests