Python：从文本文件导入到列表并基于多列排序/平均答案

【问题标题】：Python: Import from text file to list and sort/average based on multiple columnsPython：从文本文件导入到列表并基于多列排序/平均
【发布时间】：2016-01-02 19:52:42
【问题描述】：

我有一个如下所示的文本文件：

Mike 5 7 9
Terry 3 7 4
Ste 8 2 3

我写了下面的程序来

从文本文件中检索数据
将文本分成由空格分隔的列
将每个名称后面的分数按顺序排序（最低在前，最高在后）
将每个人的姓名和最高分加载到列表（scoreslist）中
对列表进行排序并按字母顺序输出结果

def alphabetical():
    scoreslist = []
    with open ("classa.txt") as f:
        content = f.read().splitlines()
        for line in content:
            splitline = line.split(" ")
            name = splitline[0]
            score = splitline[1:]
            highscore = sorted(score)[-1]
            scoreslist.append("{} {}".format(name,highscore))

    scoreslist.sort(key=lambda x: x[0])
    print(scoreslist)

最终输出如下：

Mike 9
Ste 8
Terry 7

我目前对这个功能很满意，但我觉得它可以更简洁一些。有没有更简单的方法？

更重要的是，我想取原始文件并使用相同的方法创建原始文本文件中的数字的平均值并以相同的格式输出。我认为可能有一个简单的平均函数可以使用，但这显然没有发生：

score = splitline.avg[-1:-3]

【问题讨论】：

它确实有效，但我想修改它以使用相同类型的系统创建一个平均值，而不是让它从每个人中选择最高分

标签： python list sorting average

【解决方案1】：

您可以使用statistics.mean 计算您的平均值，使用csv 库将您的文件解析为行，您永远不需要调用read，除非您真的想要所有文件内容的单个字符串，您可以迭代在一个文件对象上并分割每一行。

from statistics import mean
import csv

def sort_mean(fle):
    with open(fle) as f:
       for name, *scores in csv.reader(f, delimiter=" "):
            srt = sorted(map(int, scores))
            print("Highest score for {} is  {}".format(name, srt[-1]))
            print("Average score for {} is {}".format(name, mean(srt)))

对于您的输入文件，它将输出：

Highest score for Mike is  9
Average score for Mike is 7.0
Highest score for Terry is  7
Average score for Terry is 4.666666666666667
Highest score for Ste is  8
Average score for Ste is 4.333333333333333

现在，如果您想存储所有数据并按顺序输出：

from statistics import mean
import csv
from operator import itemgetter


def sort_mean(fle):
    avgs, high = [], []
    with open(fle) as f:
        for name, *scores in csv.reader(f, delimiter=" "):
            srt = list(map(int, scores))
            avgs.append((name, mean(srt)))
            high.append((name, max(srt)))
    avgs.sort(key=itemgetter(1), reverse=1)
    high.sort(key=itemgetter(1), reverse=1)
    return avgs, high

这将为您提供从最高到最低排序的两个列表：

In [10]: high, avgs = sort_mean("in.txt")

In [11]: high
Out[11]: [('Mike', 7.0), ('Terry', 4.666666666666667), ('Ste', 4.333333333333333)]

In [12]: avgs
Out[12]: [('Mike', 9), ('Ste', 8), ('Terry', 7)]

对于python2，你需要自己计算平均值，循环的逻辑有点不同：

def sort_mean(fle):
    avgs, high = [], []
    with open(fle) as f:
        for row in csv.reader(f, delimiter=" "):
            name, scores = row[0], row[1:]
            srt = map(int, scores)
            avgs.append((name, sum(srt,0.0) / len(srt)))
            high.append((name, max(srt)))
    avgs.sort(key=itemgetter(1), reverse=1)
    high.sort(key=itemgetter(1), reverse=1)
    return avgs, high

您可以存储用户得分最高和均值最高的字典，而不是两个列表，并对其中存储的项目进行排序。

关于你自己的函数，你可以改写如下：

def alphabetical():
    scoreslist = []
    with open ("classa.txt") as f:
        # just iterate over the file object
        # line by line
        for line in f:
            # don't need to pass a delimiter
            split_line = line.split()
            name = split_line[0]
            score = split_line[1:]
            # use max to get the highscore and use int as the key
            # or "123" < "2"
            high_score =  max(score,key=int)
            scores_list.append("{} {}".format(name,high_score))
    # don't need lambda to sort alphabetically
    scores_list.sort()
    print(scores_list)

【讨论】：

这真的很有趣，也很有帮助。非常感谢。在您删除循环并回答我的问题之前，我注意到在您提到我在循环中使用分割线的一些事情之前。你能详细说明一下吗？
@mjolnir，你可以逐行迭代，你只调用 read 或 readlines 等。如果你真的需要一次所有的数据，你在排序数字时也必须小心，字符串是逐字符比较，第一个更高的字符基本上获胜，所以你会得到不正确的输出，我添加了一个包含所有内容的你自己的代码版本，我还使用下划线更改了你的变量名，以使你的代码更具可读性

【解决方案2】：

对于您的平均问题，要么使用 sum(x) / len(x) 手动计算它，要么 statistic 模块包含 mean 函数，如另一个答案中所建议的那样。

一般来说，对于像您这样的问题，请使用pandas 模块进行数据分析。请注意，这是一个外部包，在导入之前必须是installed。教程见here。

import pandas as pd

df = pd.read_table("classa.txt", sep=" ", header=None,
                   names = ["name", "score1", "score2", "score3"])

df["max_score"] = df[["score1", "score2", "score3"]].max(axis = 1)

df_sorted = df[["name", "max_score"]].sort_values(by = "max_score",
                                                  ascending = False)


>>> df_sorted 
    name  max_score
0   Mike          9
2    Ste          8
1  Terry          7

检查pandas DataFrame 对象的.mean() 方法以获取平均值。要编写生成的DataFrame，请检查.to_csv 方法。

【讨论】：

【解决方案3】：

好的，我想了一下，这似乎工作正常。和我所有的代码一样，它并不漂亮。

scoreslist = []
with open (classchoice) as f:
    content = f.read().splitlines()
    for line in content:
        splitline = line.split(" ") #splits each line by Space
        name = splitline[0]
        total = int(splitline[-1]) + int(splitline[-2]) + int(splitline[-3]) #I created a total by adding the last three values in the text file
        average = (total/3) #then divided them by 3
        scoreslist.append("{} {}".format(name,average)) #changed the output to feature average instead of high score
scoreslist.sort(key=lambda x: x[0])
print(scoreslist)

它似乎可以工作，但我认为会有一个函数，如 min、max、mean、average 可以插入。

我是这方面的初学者，我必须承认 pandas 不是我以前用过（或见过）的东西，但感谢 paljenczy 的帮助。

【讨论】：

total = sum(map(int,splitline[-3:])) 会做同样的事情。您不需要调用 read 或 splitlines 正如我在我的答案中所显示的那样，并且我的答案中也显示了诸如 min max 和 average 之类的函数，mean 仅在 python3 中，但 min 和 max 等是内置的