【发布时间】:2020-11-19 17:49:10
【问题描述】:
我在一个 ps4 游戏网站上进行了抓取,当我抓取了变量 product_name 下的游戏名称和变量 price 下的运输价格时的产品名称,并将其保存在“dataframe.cvs”文件中。如何从 dataframe.cvs 中打印出名称最长的游戏。我不是在寻找运费,而是在寻找游戏的名称。
from bs4 import BeautifulSoup as soup
import pandas as pd
import numpy as np
from collections import defaultdict
import re
url='https://www.newegg.com/PS4-Video-Games/SubCategory/ID-3141'
with uReq(url) as uClient:
page = uClient.read()
# parsing
page_soup = soup(page, "html.parser")
# grabs products
containers= page_soup.findAll("div",{"class":"item-container"})
# save to file
filename = "products.csv"
#creating two empty dictionaries
d = defaultdict(list)
d1 = defaultdict(list)
# for loop fills dicts
for container in containers:
#scrape the brand
brand = container.div.div.a.img["title"]
#scrape the product name
title = container.findAll("a", {"class":"item-title"})
product_name= title[0].text
#scrape the shipping
shipping_container= container.findAll("li", {"class":"price-ship"})
shipping = shipping_container[0].text.strip()
#scrape price
pricec = container.find("li", {"class":"price-current"})
#removing all white spaces from price
price= pricec.text.strip('price-current')
d['Product'].append(product_name)
d['shipping'].append(shipping)
d1['Product'].append(product_name)
d1['Brand'].append(brand)
d1['price'].append(price)
# create dataframe using pandas feature
df = pd.DataFrame(d) #product and shipping
df1 =pd.DataFrame(d1) #product, brand and price
# clean shipping column
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Free Shipping' else x)
#cleaning price column
df1['price'] = df1['price'].str.extract('(\d+\.?\d+)').astype(float)
#string converted to float
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Special Shipping' else x) # probably should be handled in a special way
df['shipping'] = df['shipping'].apply(lambda x: x if x == 0 else re.sub("[^0-9]", "", x))
df['shipping'] = df['shipping'].astype(float)
# save dataframe to csv files
df.to_csv('dataframe.csv', index=False)
df1.to_csv('dataframe1.csv', index=False)```
【问题讨论】:
-
这是我到目前为止得到的,但它只给了我文件的名称。我应该通过什么论据? def long_word(filename): with open(filename, 'r') as infile: words = infile.read().split() max_len = len(max(words, key=len)) return [word in words if len(word) == max_len] print(longest_word('dataframe2.csv'))