【问题标题】:how can I find the max length word in the file?如何在文件中找到最大长度的单词?
【发布时间】:2020-11-19 17:49:10
【问题描述】:

我在一个 ps4 游戏网站上进行了抓取,当我抓取了变量 product_name 下的游戏名称和变量 price 下的运输价格时的产品名称,并将其保存在“dataframe.cvs”文件中。如何从 dataframe.cvs 中打印出名称最长的游戏。我不是在寻找运费,而是在寻找游戏的名称。

from bs4 import BeautifulSoup as soup
import pandas as pd
import numpy as np
from collections import defaultdict
import re

url='https://www.newegg.com/PS4-Video-Games/SubCategory/ID-3141'

with uReq(url) as uClient:
    page = uClient.read()

# parsing
page_soup = soup(page, "html.parser")

# grabs products
containers= page_soup.findAll("div",{"class":"item-container"})

# save to file
filename = "products.csv"


#creating two empty dictionaries
d = defaultdict(list)
d1 = defaultdict(list)

# for loop fills dicts
for container in containers:
   #scrape the brand
    brand = container.div.div.a.img["title"]
    
     #scrape the product name
    title = container.findAll("a", {"class":"item-title"})
    product_name= title[0].text 
    
     #scrape the shipping
    shipping_container= container.findAll("li", {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()
    
    #scrape price
    pricec = container.find("li", {"class":"price-current"})
    #removing all white spaces from price
    price= pricec.text.strip('price-current')
    
    
    d['Product'].append(product_name)
    d['shipping'].append(shipping)
    d1['Product'].append(product_name)
    d1['Brand'].append(brand)
    d1['price'].append(price)
    
    
# create dataframe using pandas feature
df = pd.DataFrame(d) #product and shipping
df1 =pd.DataFrame(d1) #product, brand and price


# clean shipping column
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Free Shipping' else x)

#cleaning price column
df1['price'] = df1['price'].str.extract('(\d+\.?\d+)').astype(float)

#string converted to float
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Special Shipping' else x) # probably should be handled in a special way
df['shipping'] = df['shipping'].apply(lambda x: x if x == 0 else re.sub("[^0-9]", "", x))
df['shipping'] = df['shipping'].astype(float)

# save dataframe to csv files
df.to_csv('dataframe.csv', index=False)
df1.to_csv('dataframe1.csv', index=False)```

【问题讨论】:

  • 这是我到目前为止得到的,但它只给了我文件的名称。我应该通过什么论据? def long_word(filename): with open(filename, 'r') as infile: words = infile.read().split() max_len = len(max(words, key=len)) return [word in words if len(word) == max_len] print(longest_word('dataframe2.csv'))

标签: python dataframe


【解决方案1】:

没有看到您的数据框就很难回答(请参阅how to provide a great pandas example 以获取提示)。但希望以下内容会给您一些想法,使用虚拟 df:

df = pd.DataFrame({'name': ['a','bb','ccccc','dd']})
df

这是我们要按'name'的长度排序的数据框:

    name
0   a
1   bb
2   ccccc
3   dd

我们可以执行以下操作(将len 函数应用于'name' 的每个元素并使用它进行排序

df.sort_values('name',key = lambda c : c.apply(len))

打印

    name
0   a
1   bb
3   dd
2   ccccc

【讨论】:

  • 这就是我的数据框的样子,但这只是前两行,所以它一直使用不同的值:产品、运费、品牌、价格 The Last of Us Part II - PlayStation 4,0.0, PlayStation,59.99
  • @XavierValdez 因此,如果您将 'name' 替换为 'Product',上述内容应该可以工作——对您有用吗?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2021-08-04
  • 2023-03-04
  • 1970-01-01
  • 2013-04-28
  • 1970-01-01
  • 2022-12-01
相关资源
最近更新 更多