【问题标题】:Convert rows of CSV file to a list of tuples?将 CSV 文件的行转换为元组列表?
【发布时间】:2017-03-27 09:53:23
【问题描述】:

我有一个 .CSV 文件,其中有两列,一列用于推文,另一列用于格式如下的情绪值(但用于数千条推文):

I like stackoverflow,Positive
Thanks for your answers,Positive
I hate sugar,Negative
I do not like that movie,Negative
stackoverflow is a question and answer site,Neutral
Python is oop high-level programming language,Neutral

我想得到这样的输出:

negfeats = [('I do not like that movie','Negative'),('I hate sugar','Negative')]
posfeats = [('I like stackoverflow','Positive'),('Thanks for your answers','Positive')]
neufeats = [('stackoverflow is a question and answer site','Neutral'),('Python is oop high-level programming language','Neutral')]

我在下面尝试过这样做,但我在元组中丢失了一些字符。另外,如何将 x、y 和 z 保持为整数而不是浮点数?

import csv
neg = ['Negative']
pos = ['Positive']
neu = ['Neutral']
neg_counter=0
pos_counter=0
neu_counter=0
negfeats = []
posfeats = []
neufeats = []
with open('ff_tweets.csv', 'Ur') as f:
    for k in f:
        if any(word in k for word in neg):
            negfeats = list(tuple(rec) for rec in csv.reader(f, delimiter=','))
            neg_counter+=1
        elif any(word in k for word in pos):
            posfeats = list(tuple(rec) for rec in csv.reader(f, delimiter=','))
            pos_counter+=1
        else:
            neufeats = list(tuple(rec) for rec in csv.reader(f, delimiter=','))
            neu_counter+=1
x = neg_counter * 3/4
y = pos_counter * 3/4
z = neu_counte * 3/4
print negfeats 
print posfeats 
print neufeats 
print x
print y
print z

【问题讨论】:

    标签: python python-2.7 list csv tuples


    【解决方案1】:

    这应该可以工作

    import csv
    
    neg = 'Negative'
    pos = 'Positive'
    neu = 'Neutral'
    negfeats = []
    posfeats = []
    neufeats = []
    
    with open('ff_tweets.csv', 'Ur') as f:
        for r in csv.reader(f):
            if r[1] == neg:
                negfeats.append((r[0], r[1]))
            if r[1] == pos:
                posfeats.append((r[0], r[1]))
            if r[1] == neu:
                neufeats.append((r[0], r[1]))
    
    x = len(negfeats) * float(3)/4
    y = len(posfeats) * float(3)/4
    z = len(neufeats) * float(3)/4
    
    print negfeats 
    print posfeats 
    print neufeats 
    print x
    print y
    print z
    

    【讨论】:

    • 您应该小心 - 在 python 2.7 中(如问题所示)3/4 返回 0
    • @will True,让我们解决这个问题。
    • @ÉbeIsaac 这个解决方案理论上应该可以工作,但是当我运行它时,我得到了 negfeats 、 posfeats 和 neufeats 的空列表
    • 不要使用is 来比较字符串。 r[1] is neg 应该是 r[1] == neg。对posneu 的检查也应该改为使用==
    • 因为is 测试身份,== 测试相等性。您应该(通常)只将is 与单例对象一起使用(None 是最好的例子)。 stackoverflow.com/questions/1504717有很多很好的解释和更多的链接。
    【解决方案2】:

    使用 Pandas 试试这个。 'Sentiment' 是 csv 文件中的一列:

    import pandas as pd
    
    df = pd.read_csv('ff_tweets.csv')
    
    pos = tuple(df.loc[df['Sentiment'] == 'Positive'].apply(tuple, axis = 1))
    neu = tuple(df.loc[df['Sentiment'] == 'Neutral'].apply(tuple, axis = 1))
    neg = tuple(df.loc[df['Sentiment'] == 'Negative'].apply(tuple, axis = 1))
    
    print pos, neg, neu
    

    输出:

    (('I like stackoverflow', 'Positive'), ('Thanks for your answers', 'Positive')) (('I hate sugar', 'Negative'), ('I do not like that movie', 'Negative')) (('stackoverflow is a question and answer site', 'Neutral'), ('Python is oop high-level programming language', 'Neutral'))
    

    【讨论】:

    • 'pos' = 阳性,'neg' = 阴性,'neu' = 中性
    • 每个都是一个单独的元组
    • 与您的示例相比,'posfeats' = 'pos' 在这种情况下,我只是使用了不同的名称:)
    • 我的意思是代码在技术上是如何工作的,而不是 pos 的含义
    • 1) 将 csv 读入数据帧,称为 'df' 2) 按 3 个关键字中的每一个过滤 'df',将结果转换为元组
    猜你喜欢
    • 1970-01-01
    • 2020-05-24
    • 2013-09-17
    • 2022-01-01
    • 2016-01-15
    • 2023-03-10
    • 2020-01-23
    • 1970-01-01
    • 2019-02-01
    相关资源
    最近更新 更多