Pandas：如何使用 plotly 作为后端根据日期绘制多条线？答案

【问题标题】：Pandas: How to plot multiple lines against date using plotly as backend?Pandas：如何使用 plotly 作为后端根据日期绘制多条线？
【发布时间】：2022-01-15 16:46:44
【问题描述】：

我有以下数据框：

RangeIndex: 1642 entries, 0 to 1641
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Date              1642 non-null   datetime64[ns]
 1   Volgnr            1642 non-null   int64         
 2   account           1642 non-null   object        
 3   Rentedatum        1642 non-null   datetime64[ns]
 4   Bedrag            1642 non-null   float64       
 5   Balance           1642 non-null   float64       
 6   tegenrekening     906 non-null    object        
 7   Code              1642 non-null   object        
 8   Naam tegenpartij  1642 non-null   object        
 9   description       1642 non-null   object        
 10  category          1642 non-null   object        
 11  Grootboek         1578 non-null   object        
 12  Kleinboek         1578 non-null   object        
dtypes: datetime64[ns](2), float64(2), int64(1), object(8)
memory usage: 166.9+ KB

'account' 有 5 个不同的帐号，如下所示：NL00ABCD0123456789

我想要两个不同的图表，但我已经被第一个卡住了即我想查看 5 个帐户的余额

根据这个论坛上的其他问题，我试过了：

pd.options.plotting.backend="plotly"
df.set_index('Date', inplace=True)
df.groupby('account')['balance'].plot(legend=True)

但出现以下错误：

TypeError: line() got an unexpected keyword argument 'legend'

这里出了什么问题？

稍后：如果解决了这个问题，我希望 X 轴是几周或几个月，而不是绝对日期，所以需要进行一些聚合

【问题讨论】：

我对 plotly 库不熟悉，但是从错误中我可以推测 plot 方法没有属性图例。检查文档以确定创建图例的正确属性名称。
你能提供几行样本数据吗？努力理解列名以生成有意义的内容以提供答案。此外，使用 plotly express API 而不是 pandas 包装层通常更简单

标签： python pandas group-by plotly

【解决方案1】：

简答：

您看到此错误是因为运行df.plot() 将在定义pd.options.plotting.backend="plotly" 后触发px.line()。而px.line() 没有legend 属性。但你不需要它。您只需要：

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

你会得到：

详情：

设置pd.options.plotting.backend="plotly" 将覆盖pandas 的默认绘图后端matplotlib。尽管如此，之后运行help(df.plot()) 时，弹出的帮助信息似乎仍然是关于matplotlib 的信息，实际上它确实具有legend 属性。

但是px.line()是df.plot()在实例化pd.options.plotting.backend="plotly"之后会触发的。这就是触发您的错误的原因，因为 px.line 没有 legend 属性。 Dut 不要担心，因为事情对您来说变得非常简单，因为px.line() 将为您生成一个分组图例。只要正确应用df.plot()，您甚至不需要对数据进行分组。

但在我们开始之前，我们必须先看看您提供的数据集。鉴于您的问题的措辞以及您提供的“数据”的外观，我的理解是您在 account 下有几个非唯一帐户，这些帐户与 balance 的不同值相关联，分布在多个非- 独特的日期。像这样的：

          Date             Account  Balance
0   01.01.2022  NL00ABCD0123456789        1
1   01.01.2022  NL00ABCD0123456790        2
2   01.01.2022  NL00ABCD0123456791        2
3   01.01.2022  NL00ABCD0123456792        3
4   01.01.2022  NL00ABCD0123456793        4
5   02.01.2022  NL00ABCD0123456789        2
6   02.01.2022  NL00ABCD0123456790        3
7   02.01.2022  NL00ABCD0123456791        3
8   02.01.2022  NL00ABCD0123456792        4
9   02.01.2022  NL00ABCD0123456793        5

如果是这样，那么你需要做的就是运行：

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

剧情：

完整代码：

import pandas as pd
import plotly.express as px

pd.options.plotting.backend="plotly"
df = pd.DataFrame({'Date': {0: '01.01.2022',
              1: '01.01.2022',
              2: '01.01.2022',
              3: '01.01.2022',
              4: '01.01.2022',
              5: '02.01.2022',
              6: '02.01.2022',
              7: '02.01.2022',
              8: '02.01.2022',
              9: '02.01.2022',
              10: '03.01.2022',
              11: '03.01.2022',
              12: '03.01.2022',
              13: '03.01.2022',
              14: '03.01.2022',
              15: '04.01.2022',
              16: '04.01.2022',
              17: '04.01.2022',
              18: '04.01.2022',
              19: '04.01.2022'},
             'Account': {0: 'NL00ABCD0123456789',
              1: 'NL00ABCD0123456790',
              2: 'NL00ABCD0123456791',
              3: 'NL00ABCD0123456792',
              4: 'NL00ABCD0123456793',
              5: 'NL00ABCD0123456789',
              6: 'NL00ABCD0123456790',
              7: 'NL00ABCD0123456791',
              8: 'NL00ABCD0123456792',
              9: 'NL00ABCD0123456793',
              10: 'NL00ABCD0123456789',
              11: 'NL00ABCD0123456790',
              12: 'NL00ABCD0123456791',
              13: 'NL00ABCD0123456792',
              14: 'NL00ABCD0123456793',
              15: 'NL00ABCD0123456789',
              16: 'NL00ABCD0123456790',
              17: 'NL00ABCD0123456791',
              18: 'NL00ABCD0123456792',
              19: 'NL00ABCD0123456793'},
             'Balance': {0: 1,
              1: 2,
              2: 2,
              3: 3,
              4: 4,
              5: 2,
              6: 3,
              7: 3,
              8: 4,
              9: 5,
              10: 3,
              11: 4,
              12: 4,
              13: 5,
              14: 6,
              15: 4,
              16: 5,
              17: 5,
              18: 6,
              19: 7}})

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

【讨论】：

我出去了一段时间，同时有几个答案进来了。感谢大家花时间来看看我的问题。 @vestland 非常感谢。您完全正确地解释了我丢失的数据。（我不知道你是怎么在堆栈溢出中这么快得到它的。）我试过了，它奏效了。顶！
@matje59 很高兴它对你有用！并感谢您的反馈！如今，plotly 标签上有许多乐于助人、知识渊博、才华横溢且非常活跃的用户，所以如果您想回答问题，必须快速回答 =)
@matje59 如果您想如何使用数据样本轻松增强您未来的问题，我只是建议您按照步骤here
这是最有帮助的谢谢！

【解决方案2】：

由于您没有提供示例数据，我有一个任意时间序列数据的解决方案。

{'Date': ['10/03/2004',
  '10/03/2004',
  '10/03/2004',
  '10/03/2004',
  '10/03/2004'],
 'Time': ['18.00.00', '19.00.00', '20.00.00', '21.00.00', '22.00.00'],
 'CO(GT)': ['2,6', '2', '2,2', '2,2', '1,6'],
 'PT08.S1(CO)': [1360.0, 1292.0, 1402.0, 1376.0, 1272.0],
 'NMHC(GT)': [150.0, 112.0, 88.0, 80.0, 51.0],
 'C6H6(GT)': ['11,9', '9,4', '9,0', '9,2', '6,5'],
 'PT08.S2(NMHC)': [1046.0, 955.0, 939.0, 948.0, 836.0],
 'NOx(GT)': [166.0, 103.0, 131.0, 172.0, 131.0],
 'PT08.S3(NOx)': [1056.0, 1174.0, 1140.0, 1092.0, 1205.0],
 'NO2(GT)': [113.0, 92.0, 114.0, 122.0, 116.0],
 'PT08.S4(NO2)': [1692.0, 1559.0, 1555.0, 1584.0, 1490.0],
 'PT08.S5(O3)': [1268.0, 972.0, 1074.0, 1203.0, 1110.0],
 'T': ['13,6', '13,3', '11,9', '11,0', '11,2'],
 'RH': ['48,9', '47,7', '54,0', '60,0', '59,6'],
 'AH': ['0,7578', '0,7255', '0,7502', '0,7867', '0,7888']
}

我们需要将日期转换为datetime 对象。


df['Date'] = pd.to_datetime(df['Date'] + " " + df['Time'], format="%d/%m/%Y %H.%M.%S")

# To plot with monthly aggregation you can use resample. 
df.set_index('Date').resample('1M').mean().plot()

【讨论】：