【发布时间】:2018-04-05 17:03:04
【问题描述】:
我是 python 和 pandas 的新手,所以我很快遇到了问题。我目前正在使用 Spyder。
我正在尝试在一列中查找一个短语(不是完整的字符串)并提取包含该短语的所有行。到目前为止,这是我的代码:
import pandas as pd
df2 = pd.read_csv("C:\...\Desktop\publiccomments.csv")
print[df2["Document_Title"].str.contains("King")]
当我这样做时,我得到一个布尔列表:
0 True
1 False
2 False
3 False
4 False
&tc.
当我试图掩盖它时,我得到了一大堆错误。
print(df2["Document_Title"].str.contains("King"))
返回
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:...Continuum\Anaconda2\lib\site- packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/.../untitled1.py", line 15, in <module>
print(df2[df2["Document_Title"].str.contains("King")])
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2053, in __getitem__
return self._getitem_array(key)
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2080, in _getitem_array
if com.is_bool_indexer(key):
File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\common.py", line 201, in is_bool_indexer
raise ValueError('cannot index with vector containing '
ValueError: cannot index with vector containing NA / NaN values
我尝试添加
df2 = df1.dropna(subset=df1.columns[[1]], how='any')
解决“无法使用包含 NA / NaN 值的向量进行索引”错误,但没有骰子。
任何帮助将不胜感激!这是我的数据示例:
Document_Title Document Type \
0 Comment submitted by J. King PUBLIC SUBMISSIONS
1 Comment submitted by N. Ghani PUBLIC SUBMISSIONS
2 Comment submitted by M. Srobode PUBLIC SUBMISSIONS
3 Comment submitted by D. Hovey PUBLIC SUBMISSIONS
4 Comment submitted by B. Sweigert PUBLIC SUBMISSIONS
5 Comment submitted by M. Lundgen PUBLIC SUBMISSIONS
6 Comment submitted by Craig (no surname provided) PUBLIC SUBMISSIONS
7 Comment submitted by R. Marshall PUBLIC SUBMISSIONS
8 Comment submitted by A. Greig PUBLIC SUBMISSIONS
9 Comment submitted by J. B. Anderson PUBLIC SUBMISSIONS
Posted Date Received Date Comment Start Date Comment Due Date \
0 10/16/2014 9/8/2014 6/18/2014 12/1/2014
1 8/6/2014 6/7/2014 6/18/2014 10/16/2014
2 10/16/2014 9/15/2014 6/18/2014 12/1/2014
3 8/6/2014 6/7/2014 6/18/2014 10/16/2014
4 12/18/2014 11/8/2014 6/18/2014 12/1/2014
5 10/16/2014 9/15/2014 6/18/2014 12/1/2014
6 8/6/2014 6/7/2014 6/18/2014 10/16/2014
7 8/15/2014 6/7/2014 6/18/2014 10/16/2014
8 12/18/2014 11/8/2014 6/18/2014 12/1/2014
9 10/16/2014 9/15/2014 6/18/2014 12/1/2014
Document Detail
0 [hyperlink]
1 [hyperlink]
2 [hyperlink]
3 [hyperlink]
4 [hyperlink]
6 [hyperlink]
7 [hyperlink]
8 [hyperlink]
9 [hyperlink]
【问题讨论】:
-
这是您的数据。你还没有给我们看。这样做,您将获得更多帮助。
-
显示数据的最佳方式是什么?
-
取决于有多大。但是尝试
print(df1.head(40))和edit 您的帖子并复制/粘贴。或者更方便df1.head(40).to_clipboard()然后粘贴到您的帖子中。 -
刚刚添加了一些,谢谢!
-
根据该数据,建议的解决方案和您的尝试有效。所以这些数据不足以发现问题。我怀疑你的索引中有一个空值。试试
df1[df1.index.notna()].pipe(lambda d: d[d["Document_Title"].str.contains("King")]])
标签: string python-3.x pandas