使用逗号 Python 解析文件答案

【问题标题】：File parsing using commas Python使用逗号 Python 解析文件
【发布时间】：2016-02-10 23:40:40
【问题描述】：

我正在尝试用 python 编写一个程序，我们必须在其中添加来自不同类别和子类别的数字。该计划是关于农民每年销售其农场的农产品的。我们必须从中读取的文本文件有 4 个类别。第一类是产品类型，例如蔬菜、水果、调味品。第二类告诉我们我们拥有的产品类型，例如土豆、苹果、辣酱。第三类告诉我们 2014 年的销售额，第四类告诉我们 2015 年的销售额。在这个程序中，我们只需要根据 2015 年的数字计算总数。 2014 年的数字存在于文本文件中，但不相关。

这是文本文件的样子

PRODUCT,CATEGORY,2014 Sales,2015 Sales
Vegetables,Potatoes,4455,5644
Vegetables,Tomatoes,5544,6547
Vegetables,Peas,987,1236
Vegetables,Carrots,7877,8766
Vegetables,Broccoli,5564,3498
Fruits,Apples,398,4233
Fruits,Grapes,1099,1234
Fruits,Pear,2342,3219
Fruits,Bananas,998,1235
Fruits,Peaches,1678,1875
Condiments,Peanut Butter,3500,3902
Condiments,Hot Sauce,1234,1560
Condiments,Jelly,346,544
Condiments,Spread,2334,5644
Condiments,Ketchup,3321,3655
Condiments,Olive Oil,3211,2344

我们要做的是将 2015 年的销售额（按产品）相加，然后是 2015 年所有产品的总销售额。

书面文本文件中的输出应如下所示：

2015 年蔬菜的总销售额：{在此处插入总数}

2015 年水果的总销售额：{在此处插入总数}

2015 年调味品的总销售额：{在此处插入总数}

2015 年农民的总销售额：{Insert total for all the 2015 年销售的产品}

除此之外，它还应该在 IDE 的 Python 运行屏幕上打印总计以及文本文件：

2015 年农民的总销售额：{Insert total for all the 2015 年销售的产品}

这是我的代码。它可以工作，但会在输出中打印出奇怪的第一行。另外，我宁愿不使用列表。还有其他方法吗？请不要使用 CSV，因为我们被指示将数据用作文本文件。

readFile = open("Products.txt", "r")
reportfile = open("report.txt", "w")
line = readFile.readline()

totalSum = 0
container = []
product = ()
sum=0
for line in readFile:
    line=line.strip()
    line=line.split(",")

    if line[0] not in container:
        print(product,sum, file=reportfile)

        product = line[0]
        totalSum += int(line[3])
        sum = 0
        sum += int(line[3])
        container.append(product)
    elif product == line[0]:
        totalSum += int(line[3])
        sum += int(line[3])

print(totalSum, file=reportfile)

【问题讨论】：

标签： python python-3.x split comma

【解决方案1】：

这些类型的任务非常适合 Pandas：

import pandas

df = pandas.read_csv('Products.txt')
df = df.groupby('PRODUCT').sum()
df.ix['Total'] = df.sum()
df

【讨论】：

请阅读：我们不允许使用或将文件转换为 CSV。它应该被视为不使用 CSV 的测试文件。
抱歉也不能使用 pandas。

【解决方案2】：

将输出文件更改为stdout，以加快开发速度。如果你可以做子程序，你应该将相同代码的两个副本合并到一个子程序中。

import sys

readFile = open("Products.txt", "r")
#reportfile = open("report.txt", "w")
reportfile = sys.stdout

Categories_seen = []
Current_category = None
Category_sum = 0
Total_sum = 0

line = readFile.readline()

for line in readFile:
    category,product,sales2014,sales2015 = line.strip().split(',')
    sales2015 = int(sales2015)

    if category not in Categories_seen:
        if Current_category is not None: # Subroutine, if possible
            print("Total sales for", Current_category, "in 2015:",
                Category_sum, file=reportfile)
        Current_category = category
        Category_sum = sales2015
        Total_sum += sales2015
        Categories_seen.append(Current_category)
    elif Current_category == category:
        Category_sum += sales2015
        Total_sum += sales2015

if Current_category is not None:  # Subroutine, if possible
    print("Total sales for", Current_category, "in 2015:",
        Category_sum, file=reportfile)

print("----------", file=reportfile)
print("Total sales for the farmer in 2015:", Total_sum, file=reportfile)

【讨论】：

标准输出如何工作？对不起，我是初学者，从未使用过它。可以换成别的吗？
stdout 是“标准输出”。也就是说，如果你写一个python程序并说print("Hello, world!")而不指定file=参数，那么默认的输出位置是stdout。通常，这是您的控制台或“程序输出窗口”。无论如何，取消注释打开的文件并注释/删除标准输出行，你应该被设置。

【解决方案3】：

好的，接受挑战...

counter = dict()

with open('Products.txt') as f:
    f.next()
    for line in f:
        line = line.strip().split(',')
        product, sales = line[0], int(line[3])
        if product in counter:
            counter[product] += sales
        else:
            counter[product] = sales

    counter['Total'] = sum(counter.values())


results = """
Total sales for Vegetables in 2015 : {Condiments:d}
Total sales for Fruits in 2015 : {Fruits:d}
Total sales for Condiments in 2015 : {Vegetables:d}
Total sales for the farmer in 2015: {Total:d}
""".format(**counter)


with open('report.txt', 'w') as f:
    f.write(results)

print "Total sales for the farmer in 2015: {Total:d}".format(**counter)

【讨论】：

除了使用集合之外，还有其他方法吗？对不起，它还没有教给我们，所以没有太多的理解。
谢谢！另外，有没有办法在不制作计数器和指数的情况下格式化结果？我正在使用 Python 3
接下来是什么，“你可以在玩电锯时蒙上眼睛吗？”

【解决方案4】：

import re
import string
file = open('products.txt', 'r')
f = open('report.txt', 'a')
file= file.read()
file= file.replace(',', ' ')
file=file[file.find('\n')+1:file.rfind('\n')]
s = 0
sentences = file.splitlines()
pattern = re.compile(r"(^\s?\w+\b|(\b\w+)[\.?!\s]*$)")
for line in sentences:
    words = line.split()
    first = words[0].lower()
    last = words[-1].translate(None, string.punctuation).lower()
    print >>f, "Total sales for "+first+" in 2015: "+last
    s+= int(last)
print >>f, "_______________________________________________\nTotal sales for the farmer in 2015: "+str(s)
f.close()
print "Total sales for the farmer in 2015: "+str(s)

打开文件，并存储为字符串，使用正则表达式选择每行的第一个和最后一个单词，与自定义文本连接，将输出重定向到文件。再次打印最后一行。

【讨论】：