【发布时间】:2022-01-03 17:25:03
【问题描述】:
我有以下数据框:
a b x y language
0 id1 id_2 3 text1
1 id2 id_4 6 text2
2 id3 id_6 9 text3
3 id4 id_8 12 text4
我正在尝试使用 langdetect 来检测 y 列中文本元素的语言。
这是我为此目的使用的代码:
for i,row in df.iterrows():
df.loc[i].at["language"] = detect(df.loc[i].at["y"])
不幸的是,本专栏涉及非文本元素(包括空格、符号、数字及其组合),所以我得到以下回溯:
LangDetectException Traceback (most recent call last)
<ipython-input-40-3b2637554e5f> in <module>
1 df["language"]=""
2 for i,row in df.iterrows():
----> 3 df.loc[i].at["language"] = detect(df.loc[i].at["y"])
4 df.head()
C:\Anaconda\lib\site-packages\langdetect\detector_factory.py in detect(text)
128 detector = _factory.create()
129 detector.append(text)
--> 130 return detector.detect()
131
132
C:\Anaconda\lib\site-packages\langdetect\detector.py in detect(self)
134 which has the highest probability.
135 '''
--> 136 probabilities = self.get_probabilities()
137 if probabilities:
138 return probabilities[0].lang
C:\Anaconda\lib\site-packages\langdetect\detector.py in get_probabilities(self)
141 def get_probabilities(self):
142 if self.langprob is None:
--> 143 self._detect_block()
144 return self._sort_probability(self.langprob)
145
C:\Anaconda\lib\site-packages\langdetect\detector.py in _detect_block(self)
148 ngrams = self._extract_ngrams()
149 if not ngrams:
--> 150 raise LangDetectException(ErrorCode.CantDetectError, 'No features in text.')
151
152 self.langprob = [0.0] * len(self.langlist)
LangDetectException: No features in text.
有没有一种方法可以使用异常处理,以便 langdetect 库中的 detect 函数可用于那些适当的文本元素?
【问题讨论】: