【发布时间】:2022-01-18 22:56:13
【问题描述】:
我有一个这样的 csv 文件:
2021-01-05 10:57:12.762000, REDDE EHZ AM 00, trigger
2021-01-05 10:58:26.622000, REDDE EHZ AM 00, trigger
2021-01-05 11:02:16.772000, REDDE EHZ AM 00, trigger
2021-01-05 11:02:34.042000, REDDE EHZ AM 00, trigger
2021-01-05 17:12:07.221999, REDDE EHZ AM 00, trigger
2021-01-06 01:42:45.501999, REDDE EHZ AM 00, trigger
2021-01-06 01:44:24.481999, REDDE EHZ AM 00, trigger
2021-01-06 01:44:58.051999, REDDE EHZ AM 00, trigger
2021-01-06 01:45:14.871999, REDDE EHZ AM 00, trigger
2021-01-06 01:47:10.901999, REDDE EHZ AM 00, trigger
2021-01-06 07:57:33.221999, REDDE EHZ AM 00, trigger
2021-01-06 07:57:48.821999, REDDE EHZ AM 00, trigger
2021-01-06 07:58:51.031999, REDDE EHZ AM 00, trigger
2021-01-06 07:59:27.001999, REDDE EHZ AM 00, trigger
2021-01-06 08:00:56.871999, REDDE EHZ AM 00, trigger
2021-01-06 11:28:17.191999, REDDE EHZ AM 00, trigger
2021-01-06 11:28:46.201999, REDDE EHZ AM 00, trigger
2021-01-06 11:29:19.111999, REDDE EHZ AM 00, trigger
2021-01-06 11:29:41.891999, REDDE EHZ AM 00, trigger
2021-01-06 11:30:51.901999, REDDE EHZ AM 00, trigger
2021-01-06 11:31:21.921999, REDDE EHZ AM 00, trigger
2021-01-06 11:32:23.001999, REDDE EHZ AM 00, trigger
2021-01-06 11:32:58.271999, REDDE EHZ AM 00, trigger
2021-01-07 11:33:46.891999, REDDE EHZ AM 00, trigger
2021-01-07 12:38:50.021999, REDDE EHZ AM 00, trigger
2021-01-07 12:39:53.881999, REDDE EHZ AM 00, trigger
2021-01-08 12:42:07.371999, REDDE EHZ AM 00, trigger
2021-01-08 12:42:46.441999, REDDE EHZ AM 00, trigger
2021-01-09 12:44:14.291999, REDDE EHZ AM 00, trigger
我添加了标题:
df = pd.read_csv(r'D:\Inves\SM\CC_Cbba\REDPy\OSCREDDE_3_\redde_3_trigs.dat',
sep=',', header=None, usecols=[0, 1, 2])
headers = ["TrigDT", "Sta", "Type"]
输出是:
TrigDT Sta Type
0 2021-01-05 10:57:12.762000 REDDE EHZ AM 00 trigger
1 2021-01-05 10:58:26.622000 REDDE EHZ AM 00 trigger
2 2021-01-05 11:02:16.772000 REDDE EHZ AM 00 trigger
3 2021-01-05 11:02:34.042000 REDDE EHZ AM 00 trigger
4 2021-01-05 17:12:07.221999 REDDE EHZ AM 00 trigger
...
我在 roder 中创建了一个日期列来尝试按天对信息进行分组:
df['TrigDT'] = pd.to_datetime(df['TrigDT'])
df['Date'] = df['TrigDT'].dt.date
我尝试使用 Index 进行累积总和,因为我没有包含事件计数器的列,然后我尝试按天分组但失败了:
df = df.groupby('Date').index.sum()
df = df.groupby(df.index.day).cumsum().reset_index()
这个想法是用 DataFrame 信息创建一个累积图(带有日期的 X 轴和带有累积信息的 Y 轴),我试图有一个图 like https://stackoverflow.com/questions/53895480/python-plot-timedelta-and-cumulative-values
您介意给我一些提示以达到目标吗?预期的输出可能是这样的,在我的例子中,只有一个名为 REDDE 的站点:
【问题讨论】:
-
Y 轴上应该显示什么?似乎缺少一个数字列...您可以将您的预期输出添加到问题中吗?
-
嗨@Tranbi,我编辑了添加预期输出图像的问题,你说得对,它缺少一个数字列,我该如何生成它?,可以是我的索引列需要绘制累积的?
-
IIUC 你想显示规范的行数吗?像
df['cumul_norm'] = df.index / len(df) * 100这样的东西可以帮助你。什么是a、b和c?你想要每个Sta值一个情节吗?在这种情况下,您可能希望首先对您的 df 进行分组......无论如何,您的问题都可以从更多细节中受益。 -
嗨@Tranbi,是的,规范的行数,a-b-c 是站名,不能只按天分组吗?
标签: python pandas plot cumulative-sum