【问题标题】:Python Pandas: Drop rows from data frame if list of string value == [none]Python Pandas:如果字符串值列表== [无],则从数据框中删除行
【发布时间】:2021-03-17 16:43:32
【问题描述】:

我的数据框中有一列包含值列表。

 Tags
 [marvel, comics, comic, books, nerdy]
 [new, snapchat, version, snap, inc]
 [none]
 [new, york, times, ny, times, nyt, times]
 [today, show, today, show, today]
 [none]
 [mark, wahlberg, marky, mark]

我不知道如何从数据框中删除此 [none] 列表。我试过了,

 us_videos = us_videos.drop(us_videos.index[us_videos.tags == 'none'])

但这仅在我将列转换为字符串时才有效。如何做到这一点?

【问题讨论】:

  • 试试:us_videos[us_videos.tags.map(['none'].__ne__)]

标签: python pandas data-preprocessing


【解决方案1】:

新答案

OP 想从子列表中删除 'none' 并删除仅包含 'none' 的行

us_videos.tags.explode().pipe(lambda s: s[s != 'none']).groupby(level=0).agg(list)

0        [marvel, comics, comic, books, nerdy]
1          [new, snapchat, version, snap, inc]
3    [new, york, times, ny, times, nyt, times]
4            [today, show, today, show, today]
6                [mark, wahlberg, marky, mark]
Name: tags, dtype: object

更pythonic的方式

dat = {}
for k, v in us_videos.tags.iteritems():
    for x in v:
        if x != 'none':
            dat.setdefault(k, []).append(x)

pd.Series(dat, name='tags')

0        [marvel, comics, comic, books, nerdy]
1          [new, snapchat, version, snap, inc]
3    [new, york, times, ny, times, nyt, times]
4            [today, show, today, show, today]
6                [mark, wahlberg, marky, mark]
Name: tags, dtype: object

在理解中使用赋值表达式

pd.Series({
    k: X for k, v in us_videos.tags.iteritems()
    if (X:=[*filter('none'.__ne__, v)])
}, name='tags')

0        [marvel, comics, comic, books, nerdy]
1          [new, snapchat, version, snap, inc]
3    [new, york, times, ny, times, nyt, times]
4            [today, show, today, show, today]
6                [mark, wahlberg, marky, mark]
Name: tags, dtype: object

旧答案

explode

us_videos[us_videos.tags.explode().ne('none').any(level=0)]

                                        tags
0      [marvel, comics, comic, books, nerdy]
1        [new, snapchat, version, snap, inc]
3  [new, york, times, ny, times, nyt, times]
4          [today, show, today, show, today]
6              [mark, wahlberg, marky, mark]

list.__ne__

us_videos[us_videos.tags.map(['none'].__ne__)]

                                        tags
0      [marvel, comics, comic, books, nerdy]
1        [new, snapchat, version, snap, inc]
3  [new, york, times, ny, times, nyt, times]
4          [today, show, today, show, today]
6              [mark, wahlberg, marky, mark]

【讨论】:

  • 这很好用!这只是从列中删除 [none] 而不是列表中的 none,[today, show, today, show, today, none],对吗?
  • 那是正确的……现在它是正确的。我将all 更改为any
  • 这不会从列表中删除'none'。它只是返回包含列表中'none' 值的相同列表。它适用于'none' 是唯一元素的列表。我错了吗?
  • 你没有错。我可能误解了你需要什么。如果您想从列表中删除'none'AND,请删除仅包含'none'...一秒的行
【解决方案2】:

首先让我们编写一个函数来去除列表中的'none'

print(df)

    tags
0   [marvel, comics, comic, books, nerdy]
1   [new, snapchat, version, snap, inc]
2   [none]
3   [new, york, times, ny, times, nyt, times]
4   [today, show, today, show, today, none]


def delete_none(element):
    new = []
    for val in element:
        if val != 'none':
            new.append(val)
    if len(new) == 0:
        return np.nan
    else:
        return new

现在我们在tags 列上应用这个函数:

df.tags.apply(delete_none)

输出:

0         [marvel, comics, comic, books, nerdy]
1           [new, snapchat, version, snap, inc]
2                                           NaN
3    [new, york, times, ny, times, nyt,  times]
4             [today, show, today, show, today]

【讨论】:

  • 它没有用。 'DataFrame' 对象没有属性 'str'。
  • 它正在工作,谢谢!这会删除列表中的每个 none,即使它位于列表之间,例如 [today, show, today, show, today, none],对吗?如果我只想删除单个 [none] 怎么办?
  • 哦!那是不同的。我以为你想彻底摆脱那一排。我会更新我的答案。
  • 由此我们可以从数据框中删除 NaN 值。非常感谢!
  • 我创建了一个新的数据框,在列表元素之间有none,并编写了一个函数,如果列表只有none,则返回np.nan,否则从列表中删除none。让我知道它是如何工作的。
猜你喜欢
  • 2018-01-21
  • 2018-05-23
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-11-04
  • 1970-01-01
  • 2022-01-20
  • 1970-01-01
相关资源
最近更新 更多