【发布时间】:2021-04-10 14:41:51
【问题描述】:
我想将我的 hrefs 删除到我的数据集,但我收到此错误:“不平衡括号”!
要删除“href”,我使用以下 python 代码:
data = data.apply(lambda x: re.sub(re.findall(r'\<a(.*?)\>', x)[0], '', x) if (len(re.findall(r'\<a (.*?)\>', x))>0) and ('href' in re.findall(r'\<a (.*?)\>', x)[0]) else x)
在此应用程序之后,我收到以下错误:
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
4211 else:
4212 values = self.astype(object)._values
-> 4213 mapped = lib.map_infer(values, f, convert=convert_dtype)
4214
4215 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-25-55819437c264> in <lambda>(x)
----> 1 data = data.apply(lambda x: re.sub(re.findall(r'\<a(.*?)\>', x)[0], '', x) if (len(re.findall(r'\<a (.*?)\>', x))>0) and ('href' in re.findall(r'\<a (.*?)\>', x)[0]) else x)
2 if verbose: print('#'*10 ,'Step - Remove hrefs:'); check_vocab(data, local_vocab)
/usr/lib/python3.6/re.py in sub(pattern, repl, string, count, flags)
189 a callable, it's passed the match object and must return
190 a replacement string to be used."""
--> 191 return _compile(pattern, flags).sub(repl, string, count)
192
193 def subn(pattern, repl, string, count=0, flags=0):
/usr/lib/python3.6/re.py in _compile(pattern, flags)
299 if not sre_compile.isstring(pattern):
300 raise TypeError("first argument must be string or compiled pattern")
--> 301 p = sre_compile.compile(pattern, flags)
302 if not (flags & DEBUG):
303 if len(_cache) >= _MAXCACHE:
/usr/lib/python3.6/sre_compile.py in compile(p, flags)
560 if isstring(p):
561 pattern = p
--> 562 p = sre_parse.parse(p, flags)
563 else:
564 pattern = None
/usr/lib/python3.6/sre_parse.py in parse(str, flags, pattern)
867 if source.next is not None:
868 assert source.next == ")"
--> 869 raise source.error("unbalanced parenthesis")
870
871 if flags & SRE_FLAG_DEBUG:
error: unbalanced parenthesis at position 36
经过几个小时的练习,我有解决这个问题的想法。
【问题讨论】:
-
独立于正则表达式,尝试
data.str.replace()而不是.apply()模式 -
这对于 lambda 来说实在是太多了。将逻辑提取到适当的函数中,以便我们查看。
-
仅供参考,您无需在正则表达式中转义
<和>。