如何在文件中找到最大长度的单词？答案

【问题标题】：how can I find the max length word in the file?如何在文件中找到最大长度的单词？
【发布时间】：2020-11-19 17:49:10
【问题描述】：

我在一个 ps4 游戏网站上进行了抓取，当我抓取了变量 product_name 下的游戏名称和变量 price 下的运输价格时的产品名称，并将其保存在“dataframe.cvs”文件中。如何从 dataframe.cvs 中打印出名称最长的游戏。我不是在寻找运费，而是在寻找游戏的名称。

from bs4 import BeautifulSoup as soup
import pandas as pd
import numpy as np
from collections import defaultdict
import re

url='https://www.newegg.com/PS4-Video-Games/SubCategory/ID-3141'

with uReq(url) as uClient:
    page = uClient.read()

# parsing
page_soup = soup(page, "html.parser")

# grabs products
containers= page_soup.findAll("div",{"class":"item-container"})

# save to file
filename = "products.csv"


#creating two empty dictionaries
d = defaultdict(list)
d1 = defaultdict(list)

# for loop fills dicts
for container in containers:
   #scrape the brand
    brand = container.div.div.a.img["title"]
    
     #scrape the product name
    title = container.findAll("a", {"class":"item-title"})
    product_name= title[0].text 
    
     #scrape the shipping
    shipping_container= container.findAll("li", {"class":"price-ship"})
    shipping = shipping_container[0].text.strip()
    
    #scrape price
    pricec = container.find("li", {"class":"price-current"})
    #removing all white spaces from price
    price= pricec.text.strip('price-current')
    
    
    d['Product'].append(product_name)
    d['shipping'].append(shipping)
    d1['Product'].append(product_name)
    d1['Brand'].append(brand)
    d1['price'].append(price)
    
    
# create dataframe using pandas feature
df = pd.DataFrame(d) #product and shipping
df1 =pd.DataFrame(d1) #product, brand and price


# clean shipping column
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Free Shipping' else x)

#cleaning price column
df1['price'] = df1['price'].str.extract('(\d+\.?\d+)').astype(float)

#string converted to float
df['shipping'] = df['shipping'].apply(lambda x: 0 if x == 'Special Shipping' else x) # probably should be handled in a special way
df['shipping'] = df['shipping'].apply(lambda x: x if x == 0 else re.sub("[^0-9]", "", x))
df['shipping'] = df['shipping'].astype(float)

# save dataframe to csv files
df.to_csv('dataframe.csv', index=False)
df1.to_csv('dataframe1.csv', index=False)```

【问题讨论】：

这是我到目前为止得到的，但它只给了我文件的名称。我应该通过什么论据？ def long_word(filename): with open(filename, 'r') as infile: words = infile.read().split() max_len = len(max(words, key=len)) return [word in words if len(word) == max_len] print(longest_word('dataframe2.csv'))

标签： python dataframe

【解决方案1】：

没有看到您的数据框就很难回答（请参阅how to provide a great pandas example 以获取提示）。但希望以下内容会给您一些想法，使用虚拟 df：

df = pd.DataFrame({'name': ['a','bb','ccccc','dd']})
df

这是我们要按'name'的长度排序的数据框：

    name
0   a
1   bb
2   ccccc
3   dd

我们可以执行以下操作（将len 函数应用于'name' 的每个元素并使用它进行排序

df.sort_values('name',key = lambda c : c.apply(len))

打印

    name
0   a
1   bb
3   dd
2   ccccc

【讨论】：

这就是我的数据框的样子，但这只是前两行，所以它一直使用不同的值：产品、运费、品牌、价格 The Last of Us Part II - PlayStation 4,0.0， PlayStation,59.99
@XavierValdez 因此，如果您将 'name' 替换为 'Product'，上述内容应该可以工作——对您有用吗？