我会在这里使用带有str.replace 的正则表达式:
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
输出:
Id Phone Phone2
0 1 (+1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
正则表达式:
^(?:\(\+\d+\)) # match a (+0) leading identifier
| # OR
\D # match a non-digit
regex demo
关于国际前缀的注释:
这可能很重要。
保留前缀:
df['Phone2'] = df['Phone'].str.replace(r'[^+\d]', '', regex=True)
输出:
Id Phone Phone2
0 1 (+1)123-456-7890 +11234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 (+380)123-456-7890 +3801234567890
仅删除特定前缀(此处为 +1):
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\+1\))|[^+\d]', '', regex=True)
# or, more flexible
df['Phone2'] = df['Phone'].str.replace(r'(?:\+1\D)|[^+\d]', '', regex=True)
输出:
Id Phone Phone2
0 1 (+1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 (+380)123-456-7890 +3801234567890