plotly express 条件着色无法正常工作答案

【问题标题】：plotly express 条件着色无法正常工作
【发布时间】：2022-01-21 05:45:58
【问题描述】：

所以我正在尝试为我的线图进行条件着色，以便将数据点着色为蓝色或红色。它与电力消耗相比产生的电力（在我的数据框中是一个列EE>100%，一年中的每个小时都有“真”和“假”，我想用它来为我的情节着色）。对于散点图，它工作得很好，但是当我做一个线图时，它变得一团糟：

正如你所见，这条线不能很好地过渡/不知道在两个“假”点之间做什么。

这是我的线图代码：

def drawEE_absolute():
    return html.Div([
        dbc.Card(
            dbc.CardBody([
                dcc.Graph(
                    figure=px.line(df, x='Datum', y='Erzeugung_Gesamt', color='EE>100%', template='plotly_dark'),
                    config={
                        'displayModeBar': True,
                        'toImageButtonOptions': {
                            'filename': 'custom_image',
                            'height': None,
                            'width': None,
                        }
                    }
                )
            ])
        ),
    ])

【问题讨论】：

标签： python plot plotly web-frontend plotly-express

【解决方案1】：

我不确定在 plotly.express 中是否有一个干净的解决方案。由于 plotly.express 无论如何都会创建一个 plotly.graph_object，因此它们都只会在正在考虑的y-values 中存在'' 或NaN 时识别间隙（根据此forum post）。

这意味着我们需要将 y 值复制到不同的两个不同列，并在其中一列中将 True 替换为 NaN，在另一列中将 False 替换为 NaN。然后我们可以使用go.Scatter 来针对每个新列绘制“基准”。

样本df：

df = pd.DataFrame({
    'Datum':pd.date_range('2021-01-01 00:00:01', '2021-01-01 00:00:20', freq="s"),
    'Erzeugung_Gesamt': list(range(1,21)),
    'EE>100%': ['True']*4+['False']*4+['True']*4+['False']*4+['True']*4
})

如果我正确理解了您的问题，这应该与您的 df 相似：

>>> df
                 Datum  Erzeugung_Gesamt  EE>100%
0  2021-01-01 00:00:01                 1     True
1  2021-01-01 00:00:02                 2     True
2  2021-01-01 00:00:03                 3     True
3  2021-01-01 00:00:04                 4     True
4  2021-01-01 00:00:05                 5    False
5  2021-01-01 00:00:06                 6    False
6  2021-01-01 00:00:07                 7    False
7  2021-01-01 00:00:08                 8    False
8  2021-01-01 00:00:09                 9     True
9  2021-01-01 00:00:10                10     True
10 2021-01-01 00:00:11                11     True
11 2021-01-01 00:00:12                12     True
12 2021-01-01 00:00:13                13    False
13 2021-01-01 00:00:14                14    False
14 2021-01-01 00:00:15                15    False
15 2021-01-01 00:00:16                16    False
16 2021-01-01 00:00:17                17     True
17 2021-01-01 00:00:18                18     True
18 2021-01-01 00:00:19                19     True
19 2021-01-01 00:00:20                20     True

将两个新的 Erzeugung_Gesamt 列添加到 df（基于 EE>100% 是 'True' 还是 'False'）：

df['Erzeugung_Gesamt_true_with_gaps'] = df['Erzeugung_Gesamt'].values
df['Erzeugung_Gesamt_false_with_gaps'] = df['Erzeugung_Gesamt'].values

## for Erzeugung_Gesamt_true_gaps we replace False with NaN
## for Erzeugung_Gesamt_false_gaps we replace True with NaN
df.loc[df['EE>100%'] == 'False','Erzeugung_Gesamt_true_with_gaps'] = float("nan")
df.loc[df['EE>100%'] == 'True','Erzeugung_Gesamt_false_with_gaps'] = float("nan")

更新 df：

>>> df
                 Datum  Erzeugung_Gesamt EE>100%  Erzeugung_Gesamt_true_with_gaps  Erzeugung_Gesamt_false_with_gaps
0  2021-01-01 00:00:01                 1    True                              1.0                               NaN
1  2021-01-01 00:00:02                 2    True                              2.0                               NaN
2  2021-01-01 00:00:03                 3    True                              3.0                               NaN
3  2021-01-01 00:00:04                 4    True                              4.0                               NaN
4  2021-01-01 00:00:05                 5   False                              NaN                               5.0
5  2021-01-01 00:00:06                 6   False                              NaN                               6.0
6  2021-01-01 00:00:07                 7   False                              NaN                               7.0
7  2021-01-01 00:00:08                 8   False                              NaN                               8.0
8  2021-01-01 00:00:09                 9    True                              9.0                               NaN
9  2021-01-01 00:00:10                10    True                             10.0                               NaN
10 2021-01-01 00:00:11                11    True                             11.0                               NaN
11 2021-01-01 00:00:12                12    True                             12.0                               NaN
12 2021-01-01 00:00:13                13   False                              NaN                              13.0
13 2021-01-01 00:00:14                14   False                              NaN                              14.0
14 2021-01-01 00:00:15                15   False                              NaN                              15.0
15 2021-01-01 00:00:16                16   False                              NaN                              16.0
16 2021-01-01 00:00:17                17    True                             17.0                               NaN
17 2021-01-01 00:00:18                18    True                             18.0                               NaN
18 2021-01-01 00:00:19                19    True                             19.0                               NaN
19 2021-01-01 00:00:20                20    True                             20.0                               NaN

现在使用go.Figure 和add_traces，我们可以一次添加两个新列：

fig = go.Figure()

fig.add_trace(go.Scatter(x=df['Datum'], y=df['Erzeugung_Gesamt_true_gaps'], mode='lines', name=True))
fig.add_trace(go.Scatter(x=df['Datum'], y=df['Erzeugung_Gesamt_false_gaps'], mode='lines', name=False))
fig.update_layout(legend_title='EE>100%')

图形渲染如下：

要将其合并到您的图形生成函数中：

def drawEE_absolute():
    fig = go.Figure()

    fig.add_trace(go.Scatter(x=df['Datum'], y=df['Erzeugung_Gesamt_true_with_gaps'], mode='lines', name="True"))
    fig.add_trace(go.Scatter(x=df['Datum'], y=df['Erzeugung_Gesamt_false_with_gaps'], mode='lines', name="False"))
    fig.update_layout(legend_title='EE>100%', template='plotly_dark')

    return html.Div([
        dbc.Card(
            dbc.CardBody([
                dcc.Graph(
                    figure=fig,
                    config={
                        'displayModeBar': True,
                        'toImageButtonOptions': {
                            'filename': 'custom_image',
                            'height': None,
                            'width': None,
                        }
                    }
                )
            ])
        ),
    ])

【讨论】：

感谢您的快速回答！我遇到的问题是我的“真”和“假”值实际上是布尔值而不是字符串，并且被困在上面几分钟哈哈。幸运的是，我相对较快地发现了这个问题，现在它就像一个魅力！ :)