【发布时间】:2020-10-22 11:01:57
【问题描述】:
我确信有一个优雅的解决方案,但我找不到。在 pandas 数据框中,如何删除列中的所有重复值而忽略一个值?
repost_of_post_id title
0 7139471603 Man with an RV needs a place to park for a week
1 6688293563 Land for lease
2 None 2B/1.5B, Dishwasher, In Lancaster
3 None Looking For Convenience? Check Out Cordova Par...
4 None 2/bd 2/ba, Three Sparkling Swimming Pools, Sit...
5 None 1 bedroom w/Closet is bathrooms in Select Unit...
6 None Controlled Access/Gated, Availability 24 Hours...
7 None Beautiful 3 Bdrm 2 & 1/2 Bth Home For Rent
8 7143099582 Need Help Getting Approved?
9 None *MOVE IN READY APT* REQUEST TOUR TODAY!
我想要的是将所有None 值保留在repost_of_post_id 中,但省略任何重复的数值,例如,如果数据框中有重复的7139471603。
[更新] 我使用这个脚本得到了想要的结果,但如果可能的话,我想用单线来完成。
# remove duplicate repost id if present (i.e. don't remove rows where repost_of_post_id value is "None")
# ca_housing is the original dataframe that needs to be cleaned
ca_housing_repost_none = ca_housing.loc[ca_housing['repost_of_post_id'] == "None"]
ca_housing_repost_not_none = ca_housing.loc[ca_housing['repost_of_post_id'] != "None"]
ca_housing_repost_not_none_unique = ca_housing_repost_not_none.drop_duplicates(subset="repost_of_post_id")
ca_housing_unique = ca_housing_repost_none.append(ca_housing_repost_not_none_unique)
【问题讨论】:
-
您使用的是哪个版本的熊猫?
-
@RafaelBarros pandas==1.0.4
-
你能试试这样的方法,让我知道它是否有效吗?
repost_of_post_id = repost_of_post_id[(~repost_of_post_id.duplicated()) | repost_of_post_id.isna()]
标签: python python-3.x pandas numpy dataframe