【问题标题】:Pandas keeps converting strings to intPandas 不断将字符串转换为 int
【发布时间】:2018-06-27 18:42:53
【问题描述】:

我有来自这个问题Df groupby set comparison的以下代码:

   import pandas as pd

wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, names=['word'])
wordlist = wordlist.drop_duplicates(keep='first')
# wordlist['word'] = wordlist['word'].astype(str)
wordlist['split'] = ''
wordlist['anagrams'] = ''

for index, row in wordlist.iterrows() :
    row['split'] = list(row['word'])

    anaglist = wordlist['anagrams'] = wordlist['word'].apply(lambda x: ''.join(sorted(list(x))))
    wordlist['anagrams'] = anaglist

wordlist = wordlist.drop(['split'], axis=1)

wordlist = wordlist['anagrams'].drop_duplicates(keep='first')

print(wordlist)
print(wordlist.dtypes)

我的 example.txt 文件中的某些输入似乎被读取为整数,尤其是当字符串具有不同的字符长度时。我似乎无法强迫熊猫使用 .astype(str) 将数据视为字符串

发生了什么事?

【问题讨论】:

    标签: python string pandas


    【解决方案1】:

    首先可以使用read_csv 中的参数dtype=str 强制读取列到字符串,但如果需要显式转换数字列,则使用它。所以似乎是因为字符串值列中的所有值都被隐式转换为str

    我试着修改一下你的代码:

    设置

    import pandas as pd
    import numpy as np
    
    temp=u'''"acb"
    "acb"
    "bca"
    "foo"
    "oof"
    "spaniel"'''
    #after testing replace 'pd.compat.StringIO(temp)' to 'example.txt'
    wordlist = pd.read_csv(pd.compat.StringIO(temp), sep="\r", index_col=None, names=['word'])
    print (wordlist)
          word
    0      acb
    1      acb
    2      bca
    3      foo
    4      oof
    5  spaniel
    

    #first remove duplicates
    wordlist = wordlist.drop_duplicates()
    #create lists and join them
    wordlist['anagrams'] = wordlist['word'].apply(lambda x: ''.join(sorted(list(x))))
    
    print (wordlist)
          word anagrams
    0      acb      abc
    2      bca      abc
    3      foo      foo
    4      oof      foo
    5  spaniel  aeilnps
    
    #sort DataFrame by column anagrams
    wordlist = wordlist.sort_values('anagrams')
    

    #get first duplicated rows
    wordlist1 = wordlist[wordlist['anagrams'].duplicated()]
    print (wordlist1)
      word anagrams
    2  bca      abc
    4  oof      foo
    
    #get all duplicated rows
    wordlist2 = wordlist[wordlist['anagrams'].duplicated(keep=False)]
    print (wordlist2)
      word anagrams
    0  acb      abc
    2  bca      abc
    3  foo      foo
    4  oof      foo
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-04-08
      • 2017-02-04
      • 2020-04-08
      • 1970-01-01
      • 2011-10-02
      • 2011-03-06
      • 1970-01-01
      相关资源
      最近更新 更多