【发布时间】:2020-09-11 16:13:20
【问题描述】:
我计划调到公司的另一个职位,我要求一份典型的任务来培训自己。在某种程度上,我做到了,我真的理解了一些事情,但现在我被困住了。我正在搜索,但没有什么对我有用,因为我也无法理解它,但我想它应该是某种循环,每当我使用循环时都会出错。
基本上,我有一个包含列和大量数据的巨大 Excel 文件。这是关于卖食物和客户费用的,收费有一个错误,后来改正了,我必须了解余额到底是哪里来的 0.00。
在这里你可以看到我开始工作之前的数据。
Original Data, before I used the pivot table
后来,我使用数据透视表将行与一列分开,并发现了不同的数据和问题,他们将来可能会遇到。所以,这是我处理后的excel文件。
Data after working on it, with additional columns
这是我当前的代码:
import numpy as np
import pandas as pd
import xlrd
from pandas import Series, DataFrame
df = pd.read_excel ('C:/Data.xlsx', sheet_name = 'Sheet1',
usecols = ['Payment', 'Money', 'Created'])
df['Created'] = pd.to_datetime(df['Created'])
df['Created'] = df['Created'].dt.round('min')
df = df.pivot_table(index = ['Created'],
columns = ['Payment']).fillna(0)
df['Money','fee'] = df['Money', 'fee'].round(2)
df['Fixed Fee'] = (-df['Money', 'food'] * 25) / 100
df['Fixed Fee'] = df['Fixed Fee'].round(2)
df['OverCharge'] = np.where(df['Money', 'fee'] != df['Fixed Fee'], df['Money', 'fee'] - df['Fixed Fee'], 0)
df['OverCharge'] = df['OverCharge'].round(1)
df['Percentage'] = df['Money','fee'] / df['Money','food'] * 100
df['Percentage'] = df['Percentage'].abs()
df['Percentage'] = df['Percentage'].round(2)
df['Charges'] = np.where(df['Percentage'].notna(), np.where(df['Percentage'] > 26, 'Overcharge - 30%', 'Fixed - 25%'), 'Null')
df['Correct'] = -df['Money', 'food'] - df['Fixed Fee']
df['Incorrect'] = -df['Money', 'food'] - df['Money', 'fee']
df['Balance'] = df['Correct'] - df['Incorrect']
df['Balance'] = np.where(df['Money', 'payout'] != 0, df['Correct'].cumsum() - df['Money', 'payout'], df['Balance'])
#df.to_excel("CarIndustry.xlsx")
print(df)
输出(前 20 行):
Money Fixed Fee OverCharge Percentage Charges Correct Incorrect Balance
Payment fee food payout payoutReject
Created
2019-12-27 12:32:00 -6.00 20.0 0.00 0.0 -5.00 -1.0 30.00 Overcharge - 30% -15.00 -14.00 -1.00
2019-12-27 12:58:00 -5.26 17.5 0.00 0.0 -4.38 -0.9 30.06 Overcharge - 30% -13.12 -12.24 -0.88
2019-12-27 13:17:00 -3.46 11.5 0.00 0.0 -2.88 -0.6 30.09 Overcharge - 30% -8.62 -8.04 -0.58
2019-12-30 04:01:00 0.00 0.0 -34.29 0.0 -0.00 0.0 NaN Null 0.00 -0.00 -2.45
2019-12-30 13:24:00 -1.94 6.5 0.00 0.0 -1.62 -0.3 29.85 Overcharge - 30% -4.88 -4.56 -0.32
2020-01-01 12:53:00 -6.00 20.0 0.00 0.0 -5.00 -1.0 30.00 Overcharge - 30% -15.00 -14.00 -1.00
2020-01-01 13:06:00 -3.90 13.0 0.00 0.0 -3.25 -0.6 30.00 Overcharge - 30% -9.75 -9.10 -0.65
2020-01-01 13:27:00 -3.46 11.5 0.00 0.0 -2.88 -0.6 30.09 Overcharge - 30% -8.62 -8.04 -0.58
2020-01-01 13:38:00 -7.20 24.0 0.00 0.0 -6.00 -1.2 30.00 Overcharge - 30% -18.00 -16.80 -1.20
2020-01-01 15:10:00 -2.10 7.0 0.00 0.0 -1.75 -0.4 30.00 Overcharge - 30% -5.25 -4.90 -0.35
2020-01-01 16:31:00 -7.94 26.5 0.00 0.0 -6.62 -1.3 29.96 Overcharge - 30% -19.88 -18.56 -1.32
2020-01-01 16:51:00 -2.40 8.0 0.00 0.0 -2.00 -0.4 30.00 Overcharge - 30% -6.00 -5.60 -0.40
2020-01-01 17:00:00 -2.26 7.5 0.00 0.0 -1.88 -0.4 30.13 Overcharge - 30% -5.62 -5.24 -0.38
2020-01-01 18:21:00 -8.26 27.5 0.00 0.0 -6.88 -1.4 30.04 Overcharge - 30% -20.62 -19.24 -1.38
2020-01-03 13:24:00 -1.66 5.5 0.00 0.0 -1.38 -0.3 30.18 Overcharge - 30% -4.12 -3.84 -0.28
2020-01-03 15:53:00 -3.30 11.0 0.00 0.0 -2.75 -0.5 30.00 Overcharge - 30% -8.25 -7.70 -0.55
2020-01-03 17:39:00 -1.94 6.5 0.00 0.0 -1.62 -0.3 29.85 Overcharge - 30% -4.88 -4.56 -0.32
2020-01-03 20:22:00 -3.14 10.5 0.00 0.0 -2.62 -0.5 29.90 Overcharge - 30% -7.88 -7.36 -0.52
2020-01-03 21:18:00 -2.26 7.5 0.00 0.0 -1.88 -0.4 30.13 Overcharge - 30% -5.62 -5.24 -0.38
2020-01-06 04:01:00 0.00 0.0 -134.75 0.0 -0.00 0.0 NaN Null 0.00 -0.00 -46.36
我的结果应该是这样的:
Money Fixed Fee OverCharge Percentage Charges Correct Incorrect Balance
Payment fee food payout payoutReject
Created
2019-12-27 12:32:00 -6.00 20.0 0.00 0.0 -5.00 -1.0 30.00 Overcharge - 30% -15.00 -14.00 -1.00
2019-12-27 12:58:00 -5.26 17.5 0.00 0.0 -4.38 -0.9 30.06 Overcharge - 30% -13.12 -12.24 -0.88
2019-12-27 13:17:00 -3.46 11.5 0.00 0.0 -2.88 -0.6 30.09 Overcharge - 30% -8.62 -8.04 -0.58
2019-12-30 04:01:00 0.00 0.0 -34.29 0.0 -0.00 0.0 NaN Null 0.00 -0.00 -2.45
2019-12-30 13:24:00 -1.94 6.5 0.00 0.0 -1.62 -0.3 29.85 Overcharge - 30% -4.88 -4.56 -0.32
2020-01-01 12:53:00 -6.00 20.0 0.00 0.0 -5.00 -1.0 30.00 Overcharge - 30% -15.00 -14.00 -1.00
2020-01-01 13:06:00 -3.90 13.0 0.00 0.0 -3.25 -0.6 30.00 Overcharge - 30% -9.75 -9.10 -0.65
2020-01-01 13:27:00 -3.46 11.5 0.00 0.0 -2.88 -0.6 30.09 Overcharge - 30% -8.62 -8.04 -0.58
2020-01-01 13:38:00 -7.20 24.0 0.00 0.0 -6.00 -1.2 30.00 Overcharge - 30% -18.00 -16.80 -1.20
2020-01-01 15:10:00 -2.10 7.0 0.00 0.0 -1.75 -0.4 30.00 Overcharge - 30% -5.25 -4.90 -0.35
2020-01-01 16:31:00 -7.94 26.5 0.00 0.0 -6.62 -1.3 29.96 Overcharge - 30% -19.88 -18.56 -1.32
2020-01-01 16:51:00 -2.40 8.0 0.00 0.0 -2.00 -0.4 30.00 Overcharge - 30% -6.00 -5.60 -0.40
2020-01-01 17:00:00 -2.26 7.5 0.00 0.0 -1.88 -0.4 30.13 Overcharge - 30% -5.62 -5.24 -0.38
2020-01-01 18:21:00 -8.26 27.5 0.00 0.0 -6.88 -1.4 30.04 Overcharge - 30% -20.62 -19.24 -1.38
2020-01-03 13:24:00 -1.66 5.5 0.00 0.0 -1.38 -0.3 30.18 Overcharge - 30% -4.12 -3.84 -0.28
2020-01-03 15:53:00 -3.30 11.0 0.00 0.0 -2.75 -0.5 30.00 Overcharge - 30% -8.25 -7.70 -0.55
2020-01-03 17:39:00 -1.94 6.5 0.00 0.0 -1.62 -0.3 29.85 Overcharge - 30% -4.88 -4.56 -0.32
2020-01-03 20:22:00 -3.14 10.5 0.00 0.0 -2.62 -0.5 29.90 Overcharge - 30% -7.88 -7.36 -0.52
2020-01-03 21:18:00 -2.26 7.5 0.00 0.0 -1.88 -0.4 30.13 Overcharge - 30% -5.62 -5.24 -0.38
2020-01-06 04:01:00 0.00 0.0 -134.75 0.0 -0.00 0.0 NaN Null 0.00 -0.00 -12.07
为了更好地理解它,只需检查最后几行,基本上我希望 cumsum 从上一个支出计算到下一个支出并将这个数字放在最后并继续这样,因为这只是一小部分数据。
【问题讨论】:
-
您好,欢迎来到 SO,您有有趣的问题。如果您可以阅读How to Ask 和minimal reproducible example 基本上是为了让这个问题从头开始,这将非常有帮助,您需要1)放弃照片并提供示例输入和预期输出作为文本(如您的代码)2)尝试从逻辑上解释你想要达到的目标。
-
嘿,谢谢你的评论,我修好了,但无论如何,现在这个问题已经死了,因为有-2票。
标签: python pandas dataframe rows cumsum