【问题标题】:Delete part of string if string contains sub-string in single column instance如果字符串在单列实例中包含子字符串,则删除部分字符串
【发布时间】:2015-12-04 01:24:32
【问题描述】:

我在消息 A 和 B 的 pandas DF 中有以下内容:

Message_A
"(Live Storage: 20.00   included in Plan for $15.00 - Exceess of 10.0   @ $6.0)" 
"(Live Storage: 5.00   included in Plan for $5.00 - Exceess of 11.0   @ $40.0)" 
"(Live Storage: 10.0   out of 150.00   included in Plan for $10.00)" 
"(Live Storage: 146.0   out of 200.00   included in Plan for $150.00)" 
"(Live Storage: 150.0   - Tier 1501 to 2000   @ $350)" 
"(PY Solution -Flat Fee- of $30.00 applied)" 
"(Live Storage: 17.0   out of 40.00   included in Plan for $20.00)" 
"(Live Storage: 67.0   @ $5.00)" 
"(Live Storage: 5.00   included in Plan for $55.00 - Exceess of 13.0   @ $6.0)" 
"(Live Storage: 741.0   @ $3.00)" 
"(Live Storage: 30.00   included in Plan for $150.00 - Exceess of 39.0   @ $6.0)" 
"(Live Storage: 65.0   - Tier 51 to 75   @ $250)" 
"(Live Storage: 567.0   - Tier 501 to 750   @ $1750)" 

Message_B
"(! Price for Live Storage not found in Pricing Plan !)" 
"(! Price for Live Storage not found in Pricing Plan !) ( ABC Storage: 141.0   @ $2.00) (Discount of 10.0% applied to storage amount)" 
"(! Price for Live Storage not found in Pricing Plan !)" 
"(! Price for Live Storage not found in Pricing Plan !) ( ABC Storage: 1.0   @ $3.00)" 
"( ABC Storage: 137.0   - Tier 1251 to 150   @ $100) (!  ABC Storage Limit of 00   Exceeded !) (Local Allocated Storage: 20.00   @ $0.40) (Live Storage: 16.0   @ $??)" 
"(Discount of 10.0% applied to storage amount) (! Price for Live Storage not found in Pricing Plan !)"
"(! Live Storage not found in Pricing Plan !) (Discount of 10.0% applied to storage amount)" 
"(! Price for Live Storage not found in Pricing Plan !) (Local Allocated Storage: 100.00   @ $0.50)" 
"(! Price for Storage not found in Pricing Plan !) (Live Storage: 18.0   @ $??)" 
"(! Price for Storage not found in Pricing Plan !)(Live Storage: 69.0   @ $??)  ( ABC Storage: 401.0   @ $1.50)" 
"(Live Storage: 6.0   @ $??) (! Price for Storage not found in Pricing Plan !)" 
"(! Price for Live Storage not found in Pricing Plan !) (Discount of 10.0% applied to storage amount)" 
"(! Price for Live Storage not found in Pricing Plan !) ( ABC Storage: 270.0   - Tier 201 to 300   @ $400)" 

我希望从 message_B 中删除错误消息。这些是一些文本更改的消息,但所有错误消息都包含“!”或 '?$$' 在其中。然后我想加入 message_A 以获得单列消息。 为清楚起见,中间步骤如下所示:

Message_B
Nan
"( ABC Storage: 141.0   @ $2.00) (Discount of 10.0% applied to storage amount)" 
Nan
"( ABC Storage: 1.0   @ $3.00)" 
"( ABC Storage: 137.0   - Tier 1251 to 150   @ $100)(Local Allocated Storage: 20.00   @ $0.40)" 
"(Discount of 10.0% applied to storage amount)" 
"(Discount of 10.0% applied to storage amount)" 
 "(Local Allocated Storage: 100.00   @ $0.50)" 
Nan
"( ABC Storage: 401.0   @ $1.50)" 
Nan
"(Discount of 10.0% applied to storage amount)" 
"( ABC Storage: 270.0   - Tier 201 to 300   @ $400)" 

最终结果只是一个单列字符串(drop Nan)。 我已经能够通过将 '(' 和 .replace ')' 删除为 '|' 来拆分 message_B给一个分隔符来分割。 我已将 message_B 拆分为(新的)不同的数据框,但如何遍历 full DF 并删除不需要的消息? (我不想删除整行) 我已经尝试过df[df['Message_B'].str.contains("(Live Storage: 18.0 @ $??)")==False] 但我需要为每种类型的消息执行此操作,并且消息中的数字会发生变化。 另外,我现在意识到我不能在完整的 DF 上使用.str.contains。 任何帮助将不胜感激,对于我在消息中设置 DF 的方式感到抱歉,发现它是最容易阅读的。谢谢

编辑 我已经能够通过以下方式取出标准错误消息:

error_msg1 = "(! Price for live Storage not found in Pricing Plan !)" 
replace_with = ''
bumi_output['Message_B'] = [i.replace(error_msg1, replace_with) for i in bumi_output['Message_B']]

有没有办法使用这种方法来取出错误消息,其中部分消息可以更改?例如: (实时存储:18.0 @ $??) (实时存储:69.0 @ $??)

谢谢。

【问题讨论】:

  • 你不需要包含snippets,你可以通过每行缩进(4个空格)将内容变成代码块。或者在你写作的时候,如果你选择块并在编辑器中点击{} 符号。
  • 谢谢@DilithiumMatrix,我使用 {} 做了类似的事情,但我没有在不同的行上有行。我有 ',' 分隔符,这使它看起来很长且难以阅读。我没有看到太多的兴趣,所以我可能对这个不走运。

标签: python regex string pandas dataframe


【解决方案1】:

以下相当丑陋的列表理解通过简单地找到所有括号并排除带有'!'和'$??'然后将其余部分连接在一起

new_B = [' '.join([subs for subs in re.findall('\(.+?\)', val) if '!' not in subs and '$??' not in subs]) 
for val in df['Message_B']]

然后将其添加到 A

df['Message_A'] = df['Message_A'] + new_B

看看这是否有效:

In [26]: df['Message_A'][1]
Out[26]: '(Live Storage: 5.00   included in Plan for $5.00 - Exceess of 11.0   @ $40.0)( ABC Storage: 141.0   @ $2.00) (Discount of 10.0% applied to storage amount)'

【讨论】:

  • 嗨@Woody Pride,我一直在努力让你的答案发挥作用,但这可能是我这边的事情。如果我按照逻辑,你在'('和')'上分裂,然后返回任何没有'!'或'$??'.?为清楚起见,我正在使用 pandas read_csv 读取 df(包括 header = 0,names = [x,y,z])。我还将“消息 3”设置为带有“.astype(str)”的字符串。当我运行您的答案时,它确实运行(没有错误),但我没有任何区别。我也没有得到 New_B 列。我试图找出问题所在,因为我试图尽可能多地学习,但没有什么快乐:(
  • 嗨@Woody,我接受了你的回答。我意识到我犯了一个基本的错误。非常感谢。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-02-14
  • 1970-01-01
  • 2020-03-24
  • 2017-10-19
  • 2014-05-20
  • 1970-01-01
  • 2022-08-19
相关资源
最近更新 更多