【问题标题】:Python pandas - value_counts not working properlyPython pandas - value_counts 无法正常工作
【发布时间】:2015-12-06 01:18:45
【问题描述】:

基于this堆栈上的帖子,我尝试了像这样的值计数函数

df2 = df1.join(df1.genres.str.split(",").apply(pd.value_counts).fillna(0))

除了以下事实之外它工作得很好,尽管我的数据有 22 个独特的流派,并且在拆分后我得到 42 个值,这当然不是唯一的。 数据示例:

     Action  Adventure   Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG     Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing Accounting  Action  Adventure   Animation & Modeling    Audio Production    Casual  Design & Illustration   Early Access    Education   Free to Play    Indie   Massively Multiplayer   Photo Editing   RPG Racing  Simulation  Software Training   Sports  Strategy    Utilities   Video Production    Web Publishing  nan
0   nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan 1.0 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan

(我只粘贴了头部和第一行)

我感觉问题是由我的原始数据引起的。嗯,我的列(类型)是一个包含括号的列表列表

示例:[Action,Indie] 所以当python读取它时,它会将[Action and Action and Action]读取为不同的值,输出是303个不同的值。 所以我所做的是:

for i in df1['genres'].tolist():
if str(i) != 'nan':

    i = i[1:-1]
    new.append(i)
else:
    new.append('nan')

【问题讨论】:

  • 你可以试试:if str(i).notnull():
  • 你能告诉我你的输入数据df1,5 - 6行吗?
  • 但我认为你可以使用:print df['genres'].str.get_dummies(sep=',')
  • 好的,我找到了问题,但我不知道如何解决。我的标题数据,这意味着流派有空格问题。这意味着 Action 显示为 [space]Action , Action , Action(space)
  • 可以通过函数strip()删除这个空间

标签: python pandas


【解决方案1】:

您必须通过函数str.strip 从列genres 中删除第一个和最后一个[],然后通过函数str.replace 用空字符串替换空格

import pandas as pd

df = pd.read_csv('test/Copy of AppCrawler.csv', sep="\t")


df['genres'] = df['genres'].str.strip('[]')
df['genres'] = df['genres'].str.replace(' ', '')

df = df.join(df.genres.str.split(",").apply(pd.value_counts).fillna(0))

#temporaly display 30 rows and 60 columns
with pd.option_context('display.max_rows', 30, 'display.max_columns', 60):
    print df
    #remove for clarity
print df.columns
Index([u'Unnamed: 0', u'appid', u'currency', u'final_price', u'genres',
       u'initial_price', u'is_free', u'metacritic', u'release_date',
       u'Accounting', u'Action', u'Adventure', u'Animation&Modeling',
       u'AudioProduction', u'Casual', u'Design&Illustration', u'EarlyAccess',
       u'Education', u'FreetoPlay', u'Indie', u'MassivelyMultiplayer',
       u'PhotoEditing', u'RPG', u'Racing', u'Simulation', u'SoftwareTraining',
       u'Sports', u'Strategy', u'Utilities', u'VideoProduction',
       u'WebPublishing'],
      dtype='object')

【讨论】:

  • 正是我所需要的!我不明白你在用“with”语句做什么。你不能只打印 df 吗?
  • 也许In 19 更好解释。
猜你喜欢
  • 1970-01-01
  • 2023-02-13
  • 2014-04-26
  • 2021-04-19
  • 2020-10-11
  • 2022-01-24
  • 1970-01-01
  • 1970-01-01
  • 2023-03-26
相关资源
最近更新 更多