【问题标题】:getting max value from each column of the csv file从 csv 文件的每一列中获取最大值
【发布时间】:2013-10-25 11:54:53
【问题描述】:

有人能帮我解决以下问题吗?我自己试过了,我也附上了解决方案。我使用了二维列表,但我想要一个没有二维列表的不同解决方案,它应该更 Pythonic。

请向我建议你们中的任何人有其他方法。

Q) 考虑在 CSV 文件中给出自 1990 年以来每个月的 N 家公司的股价。文件格式如下,以第一行为标题。

年,月,A公司,B公司,C公司,......公司N

1990 年 1 月 10 日、15 日、20 日、......、50

1990 年 2 月 10 日、15 日、20 日、......、50

.

.

.

.

2013 年 9 月 50 日、10 日、15 日…………500

解决方案应该采用这种格式。 a) 公司股价最高年份和月份的列表。

这是我使用二维列表的答案。

def generate_list(file_path):
    '''
        return list of list's containing file data.'''

    data_list=None   #local variable    
    try:
        file_obj = open(file_path,'r')
        try:
            gen = (line.split(',') for line in file_obj)  #generator, to generate one line each time until EOF (End of File)
            for j,line in enumerate(gen):
                if not data_list:
                    #if dl is None then create list containing n empty lists, where n will be number of columns.
                    data_list = [[] for i in range(len(line))]
                    if line[-1].find('\n'):
                        line[-1] = line[-1][:-1] #to remove last list element's '\n' character

                #loop to convert numbers from string to float, and leave others as strings only
                for i,l in enumerate(line):
                    if i >=2 and j >= 1:
                        data_list[i].append(float(l))
                    else:            
                        data_list[i].append(l)
        except IOError, io_except:
            print io_except
        finally:
            file_obj.close()
    except IOError, io_exception:
        print io_exception

    return data_list

def generate_result(file_path):
    '''
        return list of tuples containing (max price, year, month,
company name).
    '''
    data_list = generate_list(file_path)
    re=[]   #list to store results in tuple formet as follow [(max_price, year, month, company_name), ....]
    if data_list:
        for i,d in enumerate(data_list):
            if i >= 2:
                m = max(data_list[i][1:])      #max_price for the company
                idx = data_list[i].index(m)    #getting index of max_price in the list
                yr = data_list[0][idx]          #getting year by using index of max_price in list
                mon = data_list[1][idx]        #getting month by using index of max_price in list
                com = data_list[i][0]          #getting company_name
                re.append((m,yr,mon,com))
        return re


if __name__ == '__main__':
    file_path = 'C:/Document and Settings/RajeshT/Desktop/nothing/imp/New Folder/tst.csv'
    re = generate_result(file_path)
    print 'result ', re

I have tried to solve it with generator also, but in that case it was giving result for only one company i.e. only one column.

p = 'filepath.csv'

f = open(p,'r')
head = f.readline()
gen = ((float(line.split(',')[n]), line.split(',',2)[0:2], head.split(',')[n]) for n in range(2,len(head.split(','))) for i,line in enumerate(f))
x = max((i for i in gen),key=lambda x:x[0])
print x

您可以获取以下提供的 csv 格式的输入数据..

year,month,company 1,company 2,company 3,company 4,company 5
1990,jan,201,245,243,179,133
1990,feb,228,123,124,121,180
1990,march,63,13,158,88,79
1990,april,234,68,187,67,135
1990,may,109,128,46,185,236
1990,june,53,36,202,73,210
1990,july,194,38,48,207,72
1990,august,147,116,149,93,114
1990,september,51,215,15,38,46
1990,october,16,200,115,205,118
1990,november,241,86,58,183,100
1990,december,175,97,143,77,84
1991,jan,190,68,236,202,19
1991,feb,39,209,133,221,161
1991,march,246,81,38,100,122
1991,april,37,137,106,138,26
1991,may,147,48,182,235,47
1991,june,57,20,156,38,245
1991,july,165,153,145,70,157
1991,august,154,16,162,32,21
1991,september,64,160,55,220,138
1991,october,162,72,162,222,179
1991,november,215,207,37,176,30
1991,december,106,153,31,247,69

预期输出如下。

[(246.0, '1991', 'march', 'company 1'),
 (245.0, '1990', 'jan', 'company 2'),
 (243.0,   '1990', 'jan', 'company 3'),
 (247.0, '1991', 'december', 'company 4'),
 (245.0, '1991', 'june', 'company 5')]

提前谢谢...

【问题讨论】:

  • 是 numpy 还是 pandas 一个选项?
  • 任何你认为更 Pythonic 的东西,并且只能最大限度地使用标准库函数......请不要第三方......
  • 好的,pandas 和 numpy 是你必须导入的库,所以我猜你会调用第三方,但它们非常适合这种应用程序。但是您也可以使用标准方法来做到这一点......
  • 这是因为他们没有提供标准库。这就是为什么..如果你有不止一种方法来解决这个问题,欢迎你...... :)
  • 你能发布一些实际的样本数据和预期的输出吗?

标签: python python-2.7 csv python-3.x generator


【解决方案1】:

使用collections.OrderedDictcollections.namedtuple

import csv
from collections import OrderedDict, namedtuple

with open('abc1') as f:
    reader = csv.reader(f)
    tup = namedtuple('tup', ['price', 'year', 'month'])
    d = OrderedDict()
    names = next(reader)[2:]
    for name in names:
        #initialize the dict
        d[name] = tup(0, 'year', 'month')
    for row in reader:
        year, month = row[:2]         # Use year, month, *prices = row in py3.x
        for name, price in zip(names, map(int, row[2:])): # map(int, prices) py3.x
            if d[name].price < price:
                d[name] = tup(price, year, month)
print d        

输出:

OrderedDict([
('company 1', tup(price=246, year='1991', month='march')),
('company 2', tup(price=245, year='1990', month='jan')),
('company 3', tup(price=243, year='1990', month='jan')),
('company 4', tup(price=247, year='1991', month='december')),
('company 5', tup(price=245, year='1991', month='june'))])

【讨论】:

    【解决方案2】:

    我不完全确定你想如何输出,所以现在我只是让它将输出打印到屏幕上。

    import os
    import csv
    import codecs
    
    
    ## Import data  !!!!!!!!!!!! CHANGE TO APPROPRIATE PATH !!!!!!!!!!!!!!!!!
    filename= os.path.expanduser("~/Documents/PYTHON/StackTest/tailor_raj/Workbook1.csv")
    
    ## Get useable data
    data = [row for row in csv.reader(codecs.open(filename, 'rb', encoding="utf_8"))]
    
    ## Find Number of rows
    row_count= (sum(1 for row in data)) -1
    
    ## Find Number of columns
        ## Since this cannot be explicitly done, I set it to run through the columns on one row until it fails.
        ## Failure is caught by try/except so the program does not crash
    columns_found = False
    column_try =1
    while columns_found == False:
        column_try +=1
        try:
            identify_column = data[0][column_try]
        except:
            columns_found=True
    ## Set column count to discoverd column count (1 before it failed)
    column_count=column_try-1
    
    ## Set which company we are checking (start with the first company listed. Since it starts at 0 the first company is at 2 not 3)
    companyIndex = 2
    
    #This will keep all the company bests as single rows of text. I was not sure how you wanted to output them.
    companyBest=[]
    
    ## Set loop to go through each company
    while companyIndex <= (column_count):
    
        ## For each new company reset the rowIndex and highestShare
        rowIndex=1
        highestShare=rowIndex
        
        ## Set loop to go through each row
        while rowIndex <=row_count:
            ## Test if data point is above or equal to current max
            ## Currently set to use the most recent high point
            if int(data[highestShare][companyIndex]) <= int(data[rowIndex][companyIndex]):
                highestShare=rowIndex
                
            ## Move on to next row
            rowIndex+=1
            
        ## Company best = Company Name + year + month + value
        companyBest.append(str(data[0][companyIndex])+": "+str(data[highestShare][0]) +", "+str(data[highestShare][1])+", "+str(data[highestShare][companyIndex]))
    
        ## Move on to next company
        companyIndex +=1
    
    for item in companyBest:
        print item
    

    请务必将您的文件名路径更改为更合适的路径。

    输出当前显示如下:

    A 公司:1990 年,1985 年 11 月

    B 公司:1990 年 5 月,52873

    C 公司:1990 年 5 月,3658 年

    公司 D:1990 年 11 月 156498

    E 公司:1990 年 7 月,987

    【讨论】:

    • 感谢您的尝试..我已经做了更长的路...但我想只使用生成器(如果可能的话)和最少的代码行..即更多pythonic大大地。 :)
    • 啊,我的错。我刚刚看到你尝试了一个生成器,但没有意识到你想要一个生成器作为答案。
    【解决方案3】:

    遗憾的是没有生成器,但代码量很小,尤其是在 Python 3 中:

    from operator import itemgetter
    from csv import reader
    
    with open('test.csv') as f:
        year, month, *data = zip(*reader(f))
    
    for pricelist in data:
        name = pricelist[0]
        prices = map(int, pricelist[1:])
        i, price = max(enumerate(prices), key=itemgetter(1))
        print(name, price, year[i+1], month[i+1])
    

    在 Python 2.X 中,您可以使用以下(以及不同的 print 语句)做同样的事情,但稍微笨拙一些:

    with open('test.csv') as f:
        columns = zip(*reader(f))
        year, month = columns[:2]
        data = columns[2:]
    

    好的,我想出了一些可怕的生成器!它还利用字典元组比较和reduce 来比较连续的行:

    from functools import reduce  # only in Python 3
    import csv
    
    def group(year, month, *prices):
        return ((int(p), year, month) for p in prices)
    
    def compare(a, b):
        return map(max, zip(a, group(*b)))
    
    def run(fname):
        with open(fname) as f:
            r = csv.reader(f)
            names = next(r)[2:]
            return zip(names, reduce(compare, r, group(*next(r))))
    
    list(run('test.csv'))
    

    【讨论】:

    • 有人能写出这个问题的测试用例吗
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-10-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-07-24
    相关资源
    最近更新 更多