【问题标题】:How to reshape data in Python如何在 Python 中重塑数据
【发布时间】:2020-12-23 07:03:07
【问题描述】:

我有一个只包含一行但多列的数据框:

我想将每 5 列放入一个新行。这是预期的输出:

原始数据在列表中,我转换为数据框。不知道通过一个列表来重塑是否更容易,但是这里有一个示例列表供您尝试,原始列表真的很长。 ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']

【问题讨论】:

    标签: python pandas dataframe stack reshape


    【解决方案1】:

    将其解析为列表更容易,然后将其转换为数据框。

    • 对于每个条目,用 ':' 分割条目并将键\值添加到字典中
    • 将字典转换为数据框

    试试这个:

    import pandas as pd
    
    lst = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ',
           'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
    
    dd = {}
    
    for x in lst:
       sp = x.split(':')
       if sp[0] in dd:
          dd[sp[0]].append(sp[1].replace(',',"").strip())
       else:
          dd[sp[0]] = [sp[1].replace(',',"").strip()]
          
    print(dd)
    print(pd.DataFrame(dd).to_string(index=False))
    

    输出

                                                           review compound  neg    neu    pos
              I stayed around 11 days and enjoyed stay very much.   0.5106  0.0  0.708  0.292
     Plans for weekend stay canceled due to Coronavirus shutdown.      0.0  0.0    1.0    0.0
    

    【讨论】:

      【解决方案2】:

      def main():

      data_new = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
      
      len_data = len(data_new)
      
      proc_row_mul_of_five = len_data / 5
      
      j = 5
      
      k = 0 
      
      for i in range(0,proc_row_mul_of_five):
          
          print(data_new[k:j])
          
          k = i + 5
          
          j = j + 5
      

      main()

      【讨论】:

        【解决方案3】:

        你可以试试字典

        lst = ['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ',
               'review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, ']
        
        from collections import defaultdict
        import pandas as pd
        
        data_dict = defaultdict(list)
        for _ in lst:
            header, value = _.split(':')
            data_dict [header].append(value.strip())
        
        pd.DataFrame.from_dict(data_dict)
        

        输出是

        【讨论】:

        • 我非常喜欢您的回答,只需几个步骤即可完成。但是,您能解释一下 for 循环中的“_”是什么意思吗?
        • 这只是一个传统的“一次性”变量名。看看这个link
        【解决方案4】:

        你可以使用 numpy 轻松做到这一点

        import numpy as np
        import pandas as pd
        lis = np.array(['review: I stayed around 11 days and enjoyed stay very much.', 'compound: 0.5106, ','neg: 0.0, ','neu: 0.708, ','pos: 0.292, ','review: Plans for weekend stay canceled due to Coronavirus shutdown.','compound: 0.0, ','neg: 0.0, ','neu: 1.0, ','pos: 0.0, '])
        
        
        columns = 5
        t = np.char.split(lis,":")
        cols,vals = list(zip(*t))
        dff = pd.DataFrame(np.split(np.array(vals),len(vals)/columns),
                           columns=cols[:columns]).replace(",","",regex=True)
        

        【讨论】:

          猜你喜欢
          • 2018-11-21
          • 2019-05-19
          • 2023-03-23
          • 2019-05-05
          • 1970-01-01
          • 2020-02-25
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多