【发布时间】:2019-10-23 21:43:52
【问题描述】:
我试图在 scraping 后使用 pandas 数据框将数据写入 csv,但即使在程序执行后 csv 也是空的。标头首先写入,但当数据帧生效时它们也会被覆盖。 代码如下:
from bs4 import BeautifulSoup
import requests
import re as resju
import csv
import pandas as pd
re = requests.get('https://www.farfeshplus.com/Video.asp?ZoneID=297')
soup = BeautifulSoup(re.content, 'html.parser')
links = soup.findAll('a', {'class': 'opacityit'})
links_with_text = [a['href'] for a in links]
headers = ['Name', 'LINK']
# this is output file, u can change the path as you desire, default is the working directory
file = open('data123.csv', 'w', encoding="utf-8")
writer = csv.writer(file)
writer.writerow(headers)
for i in links_with_text:
new_re = requests.get(i)
new_soup = BeautifulSoup(new_re.content, 'html.parser')
m = new_soup.select_one('h1 div')
Name = m.text
print(Name)
n = new_soup.select_one('iframe')
ni = n['src']
iframe = requests.get(ni)
i_soup = BeautifulSoup(iframe.content, 'html.parser')
d_script = i_soup.select_one('body > script')
d_link = d_script.text
mp4 = resju.compile(r"(?<=mp4:\s\[\')(.*)\'\]")
final_link = mp4.findall(d_link)[0]
print(final_link)
df = pd.DataFrame(zip(Name, final_link))
df.to_csv(file, header=None, index=False)
file.close()
df.head() 返回:
0 1
0 ل h
1 ي t
2 ل t
3 ى p
4 s
0 1
0 ل h
1 ي t
2 ل t
3 ى p
4 s
有什么建议吗?
【问题讨论】:
-
您可以在写入 csv 文件之前执行 print(df.head()) 吗?我不认为它是在写 csv 问题
-
看起来您在 for 循环中覆盖 csv,尝试将 for 循环的元素附加到全局变量,然后在循环外调用它。
-
@Ram,已编辑。请再次检查
-
@Datanovice,你能举个例子吗?我想不通
标签: python pandas beautifulsoup