【发布时间】:2019-09-24 16:23:01
【问题描述】:
我编写了一个代码,可以从网站上搜索某些关键字
当我使用print(url, count, the_word) 时,它会给我结果,但我无法将其转换为可提取的数据集。
我试过用它 panda's 但它只输出最后一个搜索结果。
def getLinks(url):
html_page = urlopen(url)
soup = bs(html_page)
links = []
for link in soup.find_all('a', href=True):
links.append(link.get('href'))
newlist=[ii for n,ii in enumerate(links) if ii not in links[:n]]
newlist.insert(0,url)
return newlist[0:10]
the_words = ['20gb', '10gb']
total_words = []
for the_word in the_words:
for url in getLinks('https://www.bt.com/'):
r = requests.get(url, allow_redirects=False)
soup = bs(r.content.lower(), 'lxml')
words = soup.find_all(text=lambda text: text and the_word.lower() in text)
count = len(words)
words_list = [ ele.strip() for ele in words ]
for word in words:
total_words.append(word.strip())
#print('\nUrl: {}\ncontains {} of word: {}'.format(url, count, the_word))
print(url, count, the_word)
results=url,count,the_word
#df=pd.DataFrame(results, columns=[the_word])
#df.to_csv(r'C:\Users\nn1\Downloads\Python\trial.csv')
#print(total_words)
我希望将print(url, count, the_word) 代码原样导出为 csv 文件。
【问题讨论】:
-
请更新您的代码块,使其更具可读性
-
此代码运行不正确。请以正确的编码方法更新您的代码
标签: python loops web-scraping beautifulsoup