【问题标题】:Modify some rows in a csv file using python使用python修改csv文件中的一些行
【发布时间】:2021-04-18 03:31:36
【问题描述】:

我有一个csv文件如下:

name,row,column,length_of_field
AB000M,8,12,1
AB000M,9,12,1
AB000M,10,0,80
AB000M,10,12,1
AB000M,11,1,1
AB000M,21,0,80
AB000M,22,0,80

我想要做的是每当列字段为 0 时,将字段长度添加到列,行应减 1。此外,如果后续行的列再次为 0,则添加第一个字段长度在列中,第一次出现时将行减 1,然后将字段长度加在一起。然后删除旧行。

所以在上面的 csv - "AB000M,10,0,80" 将变成 "AB000M,9,80,80" 并且 "AB000M,21,0,80
AB000M,22,0,80" ---这两行应替换为"AB000M,20,80,160"。

我正在尝试使用这个 sn-p 来实现这一点,但它不起作用:

df = pd.read_csv("file.csv")
for ind in df.index:
    if ind >= len(df)-1:
        break
    if df['column'][ind] == 0 and df['column'][ind + 1] != 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
    elif df['column'][ind] == 0 and df['column'][ind + 1] == 0:
        df['row'][ind] -=  1
        df['column'][ind] = 80
        df['length_of_field'][ind] += df['length_of_field'][ind + 1]
        df.drop([df.index[ind + 1]], axis=0)

【问题讨论】:

    标签: python python-3.x pandas string csv


    【解决方案1】:

    这是一个可能对您有用的示例。

    import pandas as pd
    
    df = pd.read_csv('test.csv')
    newRows = []
    last_val_is_zero = False
    tempRow = None
    for row in df.iterrows():
        vals = row[1]
        if vals['column'] == 0:
            if not last_val_is_zero:
                vals['row'] = vals['row'] - 1
                vals['column'] = vals['length_of_field']
                tempRow = vals
                last_val_is_zero = True
            else:
                tempRow['length_of_field'] = tempRow['length_of_field'] + vals['length_of_field']
        else:
            if tempRow is not None:
                newRows.append(tempRow)
            newRows.append(vals)
            tempRow = None
            last_val_is_zero = False
    
    if tempRow is not None:
        newRows.append(tempRow)
        
    newData = [[val for val in row] for row in newRows]
    newDf = pd.DataFrame(newData, columns=[x for x in newRows[0].keys()])
    

    【讨论】:

      【解决方案2】:

      样本数据:

      df_str = '''
      name,row,column,length_of_field
      AB000M,8,12,1
      AB000M,9,12,1
      AB000M,10,0,80
      AB000M,10,12,1
      AB000M,11,1,1
      AB000M,21,0,80
      AB000M,22,0,80
      AB000M,23,11,1
      AB000M,24,11,1
      AB000M,25,0,80
      AB000M,26,0,80
      AB000M,27,0,80
      AB000M,28,11,1
      AB000M,29,0,80
      '''
      df = pd.read_csv(io.StringIO(df_str.strip()), sep=',', index_col=False)
      

      解决方案:

      # split the row which to update or left
      cond = df['column'] == 0
      df_to_update = df[cond].copy()
      df_left = df[~cond].copy()
      
      # modify the update rows
      df_to_update['column'] = df_to_update['length_of_field']
      df_to_update['row'] -= 1
      
      # create tag for which is diff 1 with the previous row
      cond = df_to_update['row'].diff() != 1
      df_to_update['tag'] = np.where(cond, 1, 0)
      
      # cumsum tag to creat group
      df_to_update['label'] = df_to_update['tag'].cumsum()
      
      print(df_to_update)
      
      #       name  row  column  length_of_field  tag  label
      # 2   AB000M    9      80               80    1      1
      # 5   AB000M   20      80               80    1      2
      # 6   AB000M   21      80               80    0      2
      # 9   AB000M   24      80               80    1      3
      # 10  AB000M   25      80               80    0      3
      # 11  AB000M   26      80               80    0      3
      # 13  AB000M   28      80               80    1      4
      
      
      # agg groupy left first row, and sum(length_of_field)
      obj_list = []
      for tag, group in df_to_update.groupby('label'):
          obj = group.iloc[0].copy()
          obj['length_of_field'] = group['length_of_field'].sum()
          obj_list.append(obj)
      dfn_to_update = pd.concat(obj_list,axis=1).T[df.columns]    
      
      # merge final result
      dfn = df_left.append(dfn_to_update).sort_index()
      

      结果:

      print(dfn)
      
            name row column length_of_field
      0   AB000M   8     12               1
      1   AB000M   9     12               1
      2   AB000M   9     80              80
      3   AB000M  10     12               1
      4   AB000M  11      1               1
      5   AB000M  20     80             160
      7   AB000M  23     11               1
      8   AB000M  24     11               1
      9   AB000M  24     80             240
      12  AB000M  28     11               1
      13  AB000M  28     80              80
      
      print(df)
      
            name  row  column  length_of_field
      0   AB000M    8      12                1
      1   AB000M    9      12                1
      2   AB000M   10       0               80
      3   AB000M   10      12                1
      4   AB000M   11       1                1
      5   AB000M   21       0               80
      6   AB000M   22       0               80
      7   AB000M   23      11                1
      8   AB000M   24      11                1
      9   AB000M   25       0               80
      10  AB000M   26       0               80
      11  AB000M   27       0               80
      12  AB000M   28      11                1
      13  AB000M   29       0               80
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2014-11-12
        • 2016-05-12
        • 2019-10-09
        • 1970-01-01
        • 2020-03-30
        • 2017-04-27
        • 2015-05-31
        相关资源
        最近更新 更多