在“包含”字符串搜索后在熊猫中打印结果？答案

【问题标题】：Printing results in pandas after a "contains" string search?在“包含”字符串搜索后在熊猫中打印结果？
【发布时间】：2018-04-05 17:03:04
【问题描述】：

我是 python 和 pandas 的新手，所以我很快遇到了问题。我目前正在使用 Spyder。

我正在尝试在一列中查找一个短语（不是完整的字符串）并提取包含该短语的所有行。到目前为止，这是我的代码：

import pandas as pd
df2 = pd.read_csv("C:\...\Desktop\publiccomments.csv")
print[df2["Document_Title"].str.contains("King")]

当我这样做时，我得到一个布尔列表：

    0         True
    1        False
    2        False
    3        False
    4        False

&tc.

当我试图掩盖它时，我得到了一大堆错误。

print(df2["Document_Title"].str.contains("King"))

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:...Continuum\Anaconda2\lib\site-                          packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
          execfile(filename, namespace)
        File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
          exec(compile(scripttext, filename, 'exec'), glob, loc)
        File "C:/Users/.../untitled1.py", line 15, in <module>
          print(df2[df2["Document_Title"].str.contains("King")])
        File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2053, in __getitem__
          return self._getitem_array(key)
        File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2080, in _getitem_array
          if com.is_bool_indexer(key):
        File "C:\Users\...\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\common.py", line 201, in is_bool_indexer
          raise ValueError('cannot index with vector containing '
      ValueError: cannot index with vector containing NA / NaN values

我尝试添加

df2 = df1.dropna(subset=df1.columns[[1]], how='any')

解决“无法使用包含 NA / NaN 值的向量进行索引”错误，但没有骰子。

任何帮助将不胜感激！这是我的数据示例：

                                    Document_Title       Document Type  \
0                      Comment submitted by J. King  PUBLIC SUBMISSIONS   
1                     Comment submitted by N. Ghani  PUBLIC SUBMISSIONS   
2                   Comment submitted by M. Srobode  PUBLIC SUBMISSIONS   
3                     Comment submitted by D. Hovey  PUBLIC SUBMISSIONS   
4                  Comment submitted by B. Sweigert  PUBLIC SUBMISSIONS   
5                   Comment submitted by M. Lundgen  PUBLIC SUBMISSIONS   
6  Comment submitted by Craig (no surname provided)  PUBLIC SUBMISSIONS   
7                  Comment submitted by R. Marshall  PUBLIC SUBMISSIONS   
8                     Comment submitted by A. Greig  PUBLIC SUBMISSIONS   
9               Comment submitted by J. B. Anderson  PUBLIC SUBMISSIONS   

  Posted Date Received Date Comment Start Date Comment Due Date  \
0  10/16/2014      9/8/2014          6/18/2014        12/1/2014   
1    8/6/2014      6/7/2014          6/18/2014       10/16/2014   
2  10/16/2014     9/15/2014          6/18/2014        12/1/2014   
3    8/6/2014      6/7/2014          6/18/2014       10/16/2014   
4  12/18/2014     11/8/2014          6/18/2014        12/1/2014   
5  10/16/2014     9/15/2014          6/18/2014        12/1/2014   
6    8/6/2014      6/7/2014          6/18/2014       10/16/2014   
7   8/15/2014      6/7/2014          6/18/2014       10/16/2014   
8  12/18/2014     11/8/2014          6/18/2014        12/1/2014   
9  10/16/2014     9/15/2014          6/18/2014        12/1/2014   

                                 Document Detail  
0  [hyperlink] 
1  [hyperlink]  
2  [hyperlink]  
3  [hyperlink]   
4  [hyperlink]  
6  [hyperlink]  
7  [hyperlink] 
8  [hyperlink]  
9  [hyperlink]

【问题讨论】：

这是您的数据。你还没有给我们看。这样做，您将获得更多帮助。
显示数据的最佳方式是什么？
取决于有多大。但是尝试print(df1.head(40)) 和edit 您的帖子并复制/粘贴。或者更方便df1.head(40).to_clipboard() 然后粘贴到您的帖子中。
刚刚添加了一些，谢谢！
根据该数据，建议的解决方案和您的尝试有效。所以这些数据不足以发现问题。我怀疑你的索引中有一个空值。试试df1[df1.index.notna()].pipe(lambda d: d[d["Document_Title"].str.contains("King")]])

标签： string python-3.x pandas

【解决方案1】：

你正在寻找这样的东西

df2 = df[df["Column"].str.contains("King")]
print(df2)

基本上，您在代码中所做的是根据条件检索布尔系列。如果您使用它来过滤您的数据文件（即，将其作为选择数据框的行传递，如代码中所示），它会执行您想要的操作。

【讨论】：

很遗憾，这不起作用 - 我收到一条错误消息，看起来与我发布的消息非常相似。
嘿，即使对我来说缺少值，它也可以工作，所以这似乎不是问题。除非您提供更多详细信息，否则很遗憾无法提供帮助....:)

【解决方案2】：

我认为你的 df2 有点搞砸了。这是一个例子df：

title   tractsOfLand
King    100
Duke    50
Dutchess    4
Baron   5
Princess    5000
Rey 90
Roi 23

制作一个布尔掩码并用它索引df：

m = df2["title"].str.contains("King")
df2[m]

给予：

title  tractsOfLand
King           100

看看布尔索引in the cookbook。

【讨论】：

没用，很遗憾！我收到了另一个错误消息，看起来与第一个非常相似。