【问题标题】:Apply Spellchecking Function to Dataframe将拼写检查功能应用于数据框
【发布时间】:2018-07-17 17:38:09
【问题描述】:

我有一些包含拼写错误的数据。例如:

# Define the correct spellings:
Li_A = ["potato", "tomato", "squash", "apple", "pear"]

# Define the data that contains spelling errors:
B    = {'one' : pd.Series(["potat0", "toma3o", "s5uash", "ap8le", "pea7"], index=['a', 'b', 'c', 'd', 'e']),
        'two' : pd.Series(["po1ato", "2omato", "squ0sh", "2pple", "p3ar"], index=['a', 'b', 'c', 'd', 'e'])}

df_B = pd.DataFrame(B)

我正在尝试使用以下代码更正它们:

import pandas as pd
import difflib

# Define the function that corrects the spelling:

def Spelling(ask):
    difflib.get_close_matches(ask, Li_A, n=1, cutoff=0.5)

# Apply the function that corrects the spelling:

for index,row in df_B.iterrows():
    df_B.loc[index,'Correct one'] = Spelling(df_B['one'])

for index,row in df_B.iterrows():
    df_B.loc[index,'Correct two'] = Spelling(df_B['two'])

df_B

但我得到的只是:

      one     two  Correct one  Correct two
a  potat0  po1ato          NaN          NaN
b  toma3o  2omato          NaN          NaN
c  s5uash  squ0sh          NaN          NaN
d   ap8le   2pple          NaN          NaN
e    pea7    p3ar          NaN          NaN

如何将正确的拼写添加为我的数据框中当前显示“Nan”的新列?

当我一次运行一个单词时它确实有效:

import difflib
Li_A = ["potato", "tomato", "squash", "apple", "pear"]
B    = 'potat0'
C    = difflib.get_close_matches(B, Li_A, n=1, cutoff=0.5)
C

Out: ['potato']

【问题讨论】:

    标签: python dataframe spelling


    【解决方案1】:

    你忘记了 return 在函数和 iterrows 中使用 row 为每个循环选择值,iterrows 只使用一次:

    def Spelling(ask):
        return difflib.get_close_matches(ask, Li_A, n=1, cutoff=0.5)
    
    # Apply the function that corrects the spelling:
    
    for index,row in df_B.iterrows():
        df_B.loc[index,'Correct one'] = Spelling(row['one'])
        df_B.loc[index,'Correct two'] = Spelling(row['two'])
    
    print (df_B)
          one     two Correct one Correct two
    a  potat0  po1ato    [potato]    [potato]
    b  toma3o  2omato    [tomato]    [tomato]
    c  s5uash  squ0sh    [squash]    [squash]
    d   ap8le   2pple     [apple]     [apple]
    e    pea7    p3ar      [pear]      [pear]
    

    但更简单的是使用applymap:

    df_B[['Correct one','Correct two']] = df_B[['one','two']].applymap(Spelling)
    print (df_B)
          one     two Correct one Correct two
    a  potat0  po1ato    [potato]    [potato]
    b  toma3o  2omato    [tomato]    [tomato]
    c  s5uash  squ0sh    [squash]    [squash]
    d   ap8le   2pple     [apple]     [apple]
    e    pea7    p3ar      [pear]      [pear]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2012-03-27
      • 2011-02-13
      • 2015-04-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-09-14
      相关资源
      最近更新 更多