如何将权重作为向量从词典添加到推文？答案

【问题标题】：how to add weights as a vector from lexicon to the tweets?如何将权重作为向量从词典添加到推文？
【发布时间】：2021-05-24 22:46:13
【问题描述】：

我有一个数据集（推文），还有一个包含单词的词典，每个单词有 100 列作为权重。

我想检查推文中的一个词是否出现在词典中，我想取这个词的权重（100 列），并将其作为 100 列添加到数据集（推文）中，

注意：如果他们在词典中出现的推文中找到其他词，请对所有权重进行求和。

首先，我初始化 100 列并将它们添加到推文旁边的数据集中：

train = pd.read_csv(r"Dataset.csv")
train.sahpe
#(5000,1)
train.head(3)
# Tweet
# joy, fear
# anger, joy
# sadness  

lexicon = pd.read_csv(r"lexicon with PFA.csv")
lexicon.shape
#(10000,101)
lexicon.head(2)
#word  w1  w2  w3 .... w100
#joy   0.5 0.1 0  .... 0.2
#fear  0.2 0   0.3 ... 0.1

# Assign Column - All values initailly 0 # how we can initialized all of them automatically 
train["W1"] = 0
train["W2"] = 0
train["W3"] = 0
train["w4"] = 0
.
.
.
train["w100"] = 0

train.shape
#(5000,101)

def calcExtraFeatureW1(query):
    lexicon_score_W1 = 0
    
    # For each word in Tweet
    for i in query.split(" "):
        try:
            # Search for the weights(W1_W100) values - - If available get its wights values and added to score
            sc1 = lexicon[lexicon["word"] == i]["w1"].values[0] # here, it is work for one column, i want for all 
            lexicon_score_w1 += sc1
        except:
            # May be lexicon not available, just skip
            pass
        
    return lexicon_score_w1



desired output

#Tweet      w1    w2    w3   ... w100
#joy,fear  0.7   0.1    0.3  ..  0.3

#note: in this case, the result of joy and fear calculated

在这种情况下，它只取一列的值并将其添加到数据集中，但我希望所有列的进度相同。

【问题讨论】：

请将您的数据的 small 子集作为可用于测试的可复制 代码片段以及您对提供数据。请参阅 MRE - Minimal, Reproducible, Example 和 How to make good reproducible pandas examples。
为什么train和lexicon的长度不同（5000和10000）？
我建议避免使用术语“字典”来描述 lexicon 数据框 - 这会稍微误导读者

标签： python pandas dataframe

【解决方案1】：

我想检查推文中的一个单词是否出现在字典中

您可以使用 in 关键字检查某个项目是否在字典中，

lexicon[“word”] in train.keys()

【讨论】：

OP 的问题有些混乱，因为train 是pandas.Dataframe 而不是dict
@Jared，我已经检查了sc1 = lexicon[lexicon["word"] == i]["w1"].values[0]，但我需要检索所有权重（100）