【问题标题】:Can't select a row from a pandas dataframe, read from a csv无法从熊猫数据框中选择一行,从 csv 读取
【发布时间】:2016-07-24 16:16:27
【问题描述】:

我有一个 csv,其中包含一些我正在读入 pandas 的数据:

filename = sys.argv[1]

data = pd.read_csv(filename, sep=';', header=None)

xy = data

print str(xy)

结果:

       0                                 1
0  label                              data
1      x                      6,8,10,14,18
2      y                    7,9,13,17.5,18
3      z                         0,0,1,1,1
4      r  2,13,31,33,34,4324,32413,431,666

但是,当我尝试选择一个框架时:

xy = data['2']
xy = data['y']
xy = data['label']

它只是给了我同样的错误:

Traceback (most recent call last):
  File "Regress[AA]--[01].py", line 10, in <module>
    xy = data['label']
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1997, in __getitem__
    return self._getitem_column(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1350, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3290, in get
    loc = self.items.get_loc(item)
  File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)
  File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4084)
KeyError: 'label'

我应该如何格式化我的选择请求?

编辑:感谢@Merlin 的帮助,我得到了它的工作:

filename = sys.argv[1]
df = pd.read_csv(filename, sep=';')

for i in range(len(df.label)):
    a = str(df['label'][i])
    b = str(df['data'][i])
    print ("Row: {} - Data: {}".format(a,b))

给我:

Row: x - Data: 6,8,10,14,18
Row: y - Data: 7,9,13,17.5,18
Row: z - Data: 0,0,1,1,1
Row: r - Data: 2,13,31,33,34,4324,32413,431,666

【问题讨论】:

  • 不要更改默认 header='infer'。试试pd.read_csv(filename,sep=':')
  • 必须是这样的:x,yz 都有 5 个值,但 r 有 9 个。标头必须是 None 否则会给我一个错误:ValueError: Some errors were detected ! Line #3 (got 10 columns instead of 6)
  • 等等,错了:应该是第 4 行。
  • ";",逗号用于数组:x = [6,8,10,14,18]

标签: python csv pandas dataframe row


【解决方案1】:

试试这个:

filename = sys.argv[1]
df       = pd.read_csv(filename, sep=';')
xy       = df

不要将您的数据框命名为“数据”;您的列标题之一被命名为data!。 然后:对于 i,在 df.iterrows() 中的行: a = str(df['label'][i]) b = str(df['数据'][i]) print ("行:{} - 数据:{}".format(a,b))

 print df.head()
 print df.info()
 print df["data"].head() 

我不知道你在期待什么

from StringIO import StringIO
import pandas as pd

text = u"""label;data
x;6,8,10,14,18
y;7,9,13,17.5,18
z;0,0,1,1,1
r;2,13,31,33,34,4324,32413,431,666"""

df = pd.read_csv(StringIO(text),sep=';')

df

      label                          data
0     x                      6,8,10,14,18
1     y                    7,9,13,17.5,18
2     z                         0,0,1,1,1
3     r  2,13,31,33,34,4324,32413,431,666

df.head()

  label                              data
0     x                      6,8,10,14,18
1     y                    7,9,13,17.5,18
2     z                         0,0,1,1,1
3     r  2,13,31,33,34,4324,32413,431,666

df.info()

   <class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
label    4 non-null object
data     4 non-null object
dtypes: object(2)
memory usage: 136.0+ bytes

df["data"][1]
'7,9,13,17.5,18'

df["label"]
0    x
1    y
2    z
3    r
Name: label, dtype: object

另一个编辑:

for i, row in df.iterrows():
    a = str(df['label'][i])
    b = str(df['data'][i])
    print ("Row: {} - Data: {}".format(a,b))

【讨论】:

  • 我必须删除 , header=None 才能使其正常工作。如何按标签获取各个行?
  • 如果答案正确,请将其标记为正确,您也可以投票。这是教程..people.duke.edu/~ccc14/sta-663/UsingPandas.html
  • 不是。我仍然无法获取行,它给了我一个KeyError
  • 让我尝试阅读您的教程,我会尽快回复您。
  • 教程还是没用。作为参考,这是我的数据:pastebin.com/vmrk80d4
猜你喜欢
  • 2022-01-21
  • 1970-01-01
  • 1970-01-01
  • 2017-06-30
  • 2020-01-12
  • 2018-09-19
  • 2015-12-03
  • 2018-08-05
相关资源
最近更新 更多