【发布时间】:2018-07-17 17:38:09
【问题描述】:
我有一些包含拼写错误的数据。例如:
# Define the correct spellings:
Li_A = ["potato", "tomato", "squash", "apple", "pear"]
# Define the data that contains spelling errors:
B = {'one' : pd.Series(["potat0", "toma3o", "s5uash", "ap8le", "pea7"], index=['a', 'b', 'c', 'd', 'e']),
'two' : pd.Series(["po1ato", "2omato", "squ0sh", "2pple", "p3ar"], index=['a', 'b', 'c', 'd', 'e'])}
df_B = pd.DataFrame(B)
我正在尝试使用以下代码更正它们:
import pandas as pd
import difflib
# Define the function that corrects the spelling:
def Spelling(ask):
difflib.get_close_matches(ask, Li_A, n=1, cutoff=0.5)
# Apply the function that corrects the spelling:
for index,row in df_B.iterrows():
df_B.loc[index,'Correct one'] = Spelling(df_B['one'])
for index,row in df_B.iterrows():
df_B.loc[index,'Correct two'] = Spelling(df_B['two'])
df_B
但我得到的只是:
one two Correct one Correct two
a potat0 po1ato NaN NaN
b toma3o 2omato NaN NaN
c s5uash squ0sh NaN NaN
d ap8le 2pple NaN NaN
e pea7 p3ar NaN NaN
如何将正确的拼写添加为我的数据框中当前显示“Nan”的新列?
当我一次运行一个单词时它确实有效:
import difflib
Li_A = ["potato", "tomato", "squash", "apple", "pear"]
B = 'potat0'
C = difflib.get_close_matches(B, Li_A, n=1, cutoff=0.5)
C
Out: ['potato']
【问题讨论】: