【发布时间】:2017-12-06 09:33:33
【问题描述】:
我已经使用以下代码完成了网络抓取:
Number = soup.find('th',text = "Number of samples").find_next_sibling("td").text
for x in range(1,int(Number)+1): #loop of function to parse the data format I want
item = item_text.split('tooltip')[x].split("class")[0].replace('"','').replace(',','').replace(':','').replace("<br>"," ").replace("/","").replace("\\","")
#print(item)
TESTDATA=StringIO(item)
df = pd.read_csv(TESTDATA, sep=" ",header=None)
print(df)
现在结果如下:
0 1 2 3 4 5 6 7 8 9 \
0 TCGA-KK-A7B3-01A Male NaN Stage not reported NaN Alive FPKM 5.5
10 11 12 13 14
0 Living days 899 (2.5 years)
0 1 2 3 4 5 6 7 8 9 \
0 TCGA-G9-6347-01A Male NaN Stage not reported NaN Alive FPKM 14.2
10 11 12 13 14
0 Living days 2089 (5.7 years)
...
现在的问题是如何将这些单独的数据帧组合成一个数据帧,以便更容易保存到整个 csv 文件?
谢谢
【问题讨论】:
标签: python csv dataframe web-scraping