【问题标题】:Reading array of text elements without quotes读取不带引号的文本元素数组
【发布时间】:2021-07-11 04:27:22
【问题描述】:

有一个 pandas 数据框,其记录如下所示:

 0 [/computers_&_electronics,/computers_&_electronics/electronics_&_electrical,/computers_&_electronics/electronics_&_electrical/data_sheets_&_electronics_reference,/shopping,/shopping/consumer_resources,/shopping/consumer_resources/coupons_&_discount_offers]
 1 [/sports,/sports/college_sports,/sports/sporting_goods,/sports/sporting_goods/basketball_equipment,/sports/team_sports,/sports/team_sports/basketball]
 2 [/business_&_industrial,/business_&_industrial/advertising_&_marketing,/business_&_industrial/advertising_&_marketing/sales,/law_&_government,/law_&_government/legal,/law_&_government/legal/product_liability,/shopping,/shopping/consumer_resources]

我想将每个层次结构(例如:/sport/college)作为数组元素读取,然后执行操作。但由于层次结构中没有引号(理想情况下应该是 '/sport/college',.. .) 每条记录都被读取为一个大字符串。

我尝试了literal_eval,但没有成功。还有其他指针吗?大约有 700 万条记录需要对其执行数组转换,因此需要一种快速且可扩展的方法

【问题讨论】:

    标签: python arrays python-3.x pandas


    【解决方案1】:

    您可以删除[],然后删除split

    df['col'] = df['col'].str.strip('[]').str.split(',')
    

    【讨论】:

    • @raul - print (df['col'].head().to_list()) 是什么?
    • 没关系它确实有效,我在执行时获得的唯一警告是:/home/ubuntu/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:22: SettingWithCopyWarning: A value正在尝试在 DataFrame 中的切片副本上设置。尝试改用 .loc[row_indexer,col_indexer] = value
    猜你喜欢
    • 2016-02-22
    • 1970-01-01
    • 1970-01-01
    • 2017-06-24
    • 1970-01-01
    • 2018-01-11
    • 2019-09-22
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多