【问题标题】:Aggregate time series with group by and create chart with multiple series使用 group by 聚合时间序列并创建具有多个系列的图表
【发布时间】:2018-12-21 12:49:50
【问题描述】:

我有时间序列数据,我想创建一个记录数的月度(x 轴)计数图表(折线图),按情绪分组(多条线)

数据看起来像这样

created_at                         id                   polarity  sentiment  
0  Fri Nov 02 11:22:47 +0000 2018  1058318498663870464  0.000000   neutral   
1  Fri Nov 02 11:20:54 +0000 2018  1058318026758598656  0.011905   neutral   
2  Fri Nov 02 09:41:37 +0000 2018  1058293038739607552  0.800000  positive   
3  Fri Nov 02 09:40:48 +0000 2018  1058292834699231233  0.800000  positive   
4  Thu Nov 01 18:23:17 +0000 2018  1058061933243518976  0.233333   neutral   
5  Thu Nov 01 17:50:39 +0000 2018  1058053723157618690  0.400000  positive   
6  Wed Oct 31 18:57:53 +0000 2018  1057708251758903296  0.566667  positive   
7  Sun Oct 28 17:21:24 +0000 2018  1056596810570100736  0.000000   neutral   
8  Sun Oct 21 13:00:53 +0000 2018  1053994531845296128  0.136364   neutral   
9  Sun Oct 21 12:55:12 +0000 2018  1053993101205868544  0.083333   neutral

到目前为止,我已经设法使用以下代码汇总到每月总数:

import pandas as pd

tweets = process_twitter_json(file_name) 
#print(tweets[:10])

df = pd.DataFrame.from_records(tweets)
print(df.head(10))

#make the string date into a date field    
df['tweet_datetime'] = pd.to_datetime(df['created_at'])
df.index = df['tweet_datetime']

#print('Monthly counts')
monthly_sentiment = df.groupby('sentiment')['tweet_datetime'].resample('M').count()

我正在为如何绘制数据而苦恼。

  • 我是否可以转动情绪中的每个谨慎值 字段作为单独的列
  • 我试过 .unstack() 将情绪值转换为行, 几乎就在那里,但问题是日期变成字符串列 标题,这对图表没有好处

【问题讨论】:

    标签: python-3.x dataframe charts


    【解决方案1】:

    好的,我更改了每月聚合方法并使用 Grouper 而不是重新采样,这意味着当我执行 unstack() 时,生成的数据框是垂直的(深而窄),日期为行而不是水平,日期为列标题这意味着当我开始绘制日期时,我不再遇到将日期存储为字符串的问题。

    完整代码:

    import pandas as pd
    
    tweets = process_twitter_json(file_name) 
    
    df = pd.DataFrame.from_records(tweets)
    
    
    df['tweet_datetime'] = pd.to_datetime(df['created_at'])
    df.index = df['tweet_datetime']
    
    grouper = df.groupby(['sentiment', pd.Grouper(key='tweet_datetime', freq='M')]).id.count()
    result = grouper.unstack('sentiment').fillna(0)
    
    ##=================================================
    ##PLOTLY - charts in Jupyter
    
    from plotly import __version__
    from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
    
    print (__version__)# requires version >= 1.9.0
    
    import plotly.graph_objs as go
    
    init_notebook_mode(connected=True)
    
    trace0 = go.Scatter(
        x = result.index,
        y = result['positive'],
        name = 'Positive',
        line = dict(
            color = ('rgb(205, 12, 24)'),
            width = 4)
    )
    
    trace1 = go.Scatter(
        x = result.index,
        y = result['negative'],
        name = 'Negative',
        line = dict(
            color = ('rgb(22, 96, 167)'),
            width = 4)
    )    
    trace2 = go.Scatter(
        x = result.index,
        y = result['neutral'],
        name = 'Neutral',
        line = dict(
            color = ('rgb(12, 205, 24)'),
            width = 4)
    )
    
    data = [trace0, trace1, trace2]
    
    iplot(data)
    

    【讨论】:

      猜你喜欢
      • 2017-01-17
      • 2017-09-16
      • 1970-01-01
      • 1970-01-01
      • 2021-04-19
      • 2013-06-28
      • 2016-09-07
      • 1970-01-01
      • 2022-01-06
      相关资源
      最近更新 更多