【问题标题】:Customized method error in record linkage记录联动自定义方法错误
【发布时间】:2020-05-05 22:39:48
【问题描述】:

我想在官方documentation之后的记录联动中应用自定义方法。我的代码:

import recordlinkage as rl
from recordlinkage.base import BaseCompareFeature



   def compute_wmd( s1, s2):

       word2vec_file = "C:\\Users\\users\\Desktop\\GoogleNews-vectors-negative300.bin"
       word2vec = gensim.models.KeyedVectors.load_word2vec_format(word2vec_file, binary=True)           
       word2vec.init_sims(replace=True)  # Normalizes the vectors in the word2vec class
       score = word2vec.wmdistance(s1, s2)

       return score


comparer = rl.Compare()

comparer.add(compute_wmd('Description US', 'Description US', label='Description'))
comparer.compute(pairs, csv)

我收到此错误:

TypeError:compute_wmd() 得到了一个意外的关键字参数“标签”

如果我删除标签,我会收到此错误:

AttributeError: 'float' 对象没有属性 'labels_left'

【问题讨论】:

    标签: python function record-linkage


    【解决方案1】:

    您必须创建一个继承“BaseCompareFeature”的自定义类,然后向名为 _compute_vectorized(self, s1, s2) 的类添加一个方法。只有这样 Record Linkage 才会接受它作为一个有效的自定义函数

    示例代码:

    将记录链接导入为 rl 从 recordlinkage.base 导入 BaseCompareFeature

    类 CompareZipCodes(BaseCompareFeature):

    def __init__(self, left_on, right_on, partial_sim_value, *args, **kwargs):
        super(CompareZipCodes, self).__init__(left_on, right_on, *args, **kwargs)
    
        self.partial_sim_value = partial_sim_value
    
    def _compute_vectorized(self, s1, s2):
        """Compare zipcodes.
    
        If the zipcodes in both records are identical, the similarity
        is 0. If the first two values agree and the last two don't, then
        the similarity is 0.5. Otherwise, the similarity is 0.
        """
    
        # check if the zipcode are identical (return 1 or 0)
        sim = (s1 == s2).astype(float)
    
        # check the first 2 numbers of the distinct comparisons
        sim[(sim == 0) & (s1.str[0:2] == s2.str[0:2])] = self.partial_sim_value
    
        return sim
    

    【讨论】:

      猜你喜欢
      • 2016-10-27
      • 2015-03-03
      • 2022-10-15
      • 1970-01-01
      • 2010-10-20
      • 1970-01-01
      • 2014-07-26
      • 1970-01-01
      • 2021-08-11
      相关资源
      最近更新 更多