【问题标题】:Plotly: How to plot a regression line using plotly and plotly express?Plotly:如何使用 plotly 和 plotly express 绘制回归线?
【发布时间】:2019-11-05 09:28:13
【问题描述】:

我有一个数据框 df,其中包含 pm1 和 pm25 列。我想显示这两个信号的相关性的图表(带有 Plotly)。到目前为止,我已经设法展示了散点图,但我没有设法绘制出信号之间相关性的拟合线。到目前为止,我已经尝试过:

denominator=df.pm1**2-df.pm1.mean()*df.pm1.sum()
print('denominator',denominator)
m=(df.pm1.dot(df.pm25)-df.pm25.mean()*df.pm1.sum())/denominator
b=(df.pm25.mean()*df.pm1.dot(df.pm1)-df.pm1.mean()*df.pm1.dot(df.pm25))/denominator
y_pred=m*df.pm1+b


lineOfBestFit = go.Scattergl(
    x=df.pm1,
    y=y_pred,
    name='Line of best fit',
    line=dict(
        color='red',
    )
)

data = [dataPoints, lineOfBestFit]
figure = go.Figure(data=data)

figure.show()

剧情:

如何使 lineOfBestFit 正确绘制?

【问题讨论】:

    标签: python dataframe plotly regression plotly-python


    【解决方案1】:

    更新 1:

    既然 plotly express 可以轻而易举地处理long and wide format(在您的情况下为后者)的数据,您唯一需要绘制回归线的是:

    fig = px.scatter(df, x='X', y='Y', trendline="ols")
    

    问题末尾宽数据的完整代码sn-p

    如果您希望回归线脱颖而出,您可以在以下位置指定trendline_color_override

    fig = `px.scatter([...], trendline_color_override = 'red') 
    

    或者在构建你的人物之后编辑线条颜色:

    fig.data[1].line.color = 'red'
    

    您可以访问回归参数,例如 alpha 和 beta through

    model = px.get_trendline_results(fig)
    alpha = model.iloc[0]["px_fit_results"].params[0]
    beta = model.iloc[0]["px_fit_results"].params[1]
    

    您甚至可以通过以下方式请求非线性拟合:

    fig = px.scatter(df, x='X', y='Y', trendline="lowess")
    

    那些长格式呢?这就是情节表达揭示了它的一些真正力量的地方。如果以内置数据集px.data.gapminder 为例,您可以通过指定color="continent" 来触发一系列国家/地区的单独行:

    完成长格式的 sn-p

    import plotly.express as px
    
    df = px.data.gapminder().query("year == 2007")
    fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
    fig.show()
    

    如果您希望在模型选择和输出方面更加灵活,您可以随时参考我对下面这篇文章的原始回答。但首先,这是我回答开头的这些示例的完整 sn-p:

    完成宽数据的 sn-p

    import plotly.graph_objects as go
    import plotly.express as px
    import statsmodels.api as sm
    import pandas as pd
    import numpy as np
    import datetime
    
    # data
    np.random.seed(123)
    numdays=20
    X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
    Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
    df = pd.DataFrame({'X': X, 'Y':Y})
    
    # figure with regression
    # fig = px.scatter(df, x='X', y='Y', trendline="ols")
    fig = px.scatter(df, x='X', y='Y', trendline="lowess")
    
    # make the regression line stand out
    fig.data[1].line.color = 'red'
    
    # plotly figure layout
    fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
    
    fig.show()
    

    原答案:

    对于回归分析,我喜欢使用statsmodels.apisklearn.linear_model。我还喜欢在 pandas 数据框中组织数据和回归结果。以下是一种以干净、有条理的方式完成您正在寻找的事情的方法:

    使用 sklearn 或 statsmodels 绘图:

    使用 sklearn 的代码:

    from sklearn.linear_model import LinearRegression
    import plotly.graph_objects as go
    import pandas as pd
    import numpy as np
    import datetime
    
    # data
    np.random.seed(123)
    numdays=20
    
    X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
    Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
    df = pd.DataFrame({'X': X, 'Y':Y})
    
    # regression
    reg = LinearRegression().fit(np.vstack(df['X']), Y)
    df['bestfit'] = reg.predict(np.vstack(df['X']))
    
    # plotly figure setup
    fig=go.Figure()
    fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
    fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))
    
    # plotly figure layout
    fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
    
    fig.show()
    

    使用 statsmodels 的代码:

    import plotly.graph_objects as go
    import statsmodels.api as sm
    import pandas as pd
    import numpy as np
    import datetime
    
    # data
    np.random.seed(123)
    numdays=20
    
    X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
    Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
    
    df = pd.DataFrame({'X': X, 'Y':Y})
    
    # regression
    df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues
    
    # plotly figure setup
    fig=go.Figure()
    fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
    fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))
    
    
    # plotly figure layout
    fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')
    
    fig.show()
    

    【讨论】:

      【解决方案2】:

      Plotly 还为 statsmodels 提供了一个本地包装器,用于绘制(非)线性线:

      引用他们的文档:https://plotly.com/python/linear-fits/

      
      import plotly.express as px
      
      df = px.data.tips()
      fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
      fig.show()
      

      【讨论】:

      • 哇,这是一种非常直观且快速的实现方式,正是问题所要求的
      猜你喜欢
      • 1970-01-01
      • 2019-05-19
      • 1970-01-01
      • 2019-11-04
      • 2020-11-01
      • 2021-07-01
      • 2017-09-24
      • 2021-10-21
      • 1970-01-01
      相关资源
      最近更新 更多