【问题标题】:Creation of extra column that increases the value after each blank row using pandas使用熊猫创建额外的列,在每个空白行之后增加值
【发布时间】:2020-08-07 20:54:31
【问题描述】:

我有一个类似自爆的csv 文件:

word   tag
w1     t1
w2     t2
w3     t3

w4     t4
w5     t5
w6     t6
w7     t7

w8     t8
w9     t9

我想添加一个名为句子编号的列和如何对下面显示的句子进行赋值。

期望的输出

sentence#    word   tag
sentence:1   w1     t1
             w2     t2
             w3     t3
  
sentence:2   w4     t4
             w5     t5
             w6     t6
             w7     t7
    
sentence:3   w8     t8
             w9     t9

当我们到达一个空白行时,将在前一个值上添加一个。我想要这样的东西。如何达到上面我想要的输出?

代码

from csv import reader

i = 0
with open('username.csv', 'rt', encoding='utf-8') as f:
csv_reader = pd.read_csv(f, delimiter=';')
csv_reader1 = reader(f)

for line in csv_reader1:
    if not line:
        i+=1 # empty lines
    else:
        csv_reader["sentence#"] = i
        print(line)

【问题讨论】:

标签: python-3.x pandas csv


【解决方案1】:

熊猫解决方案

由于您使用空行来分隔句子,您需要注意pd.read_csv 有一个默认为True 的参数skip_blank_lines。只需将其设置为 False,以便我们可以使用这些行。

其次,与循环相比,执行全列或全行操作通常是一个更好的主意(它更快,在某些情况下它使用更少的内存)。为此,您需要找到在整行上重复的模式:我们前面提到的空白行。

样本数据

import io
fo = io.StringIO('''word;tag
w1;t1
w2;t2
w3;t3

w4;t4
w5;t5
w6;t6
w7;t7

w8;t8
w9;t9''')

df = pd.read_csv(fo, skip_blank_lines=False)
fo.close()

代码

df.insert(0, column='sentence', value=df.word.isna().cumsum()+1)
    # breakdown
    # .isna will mark True on all empty rows
    # .cumsum will create the increasing integer id for each sentence
df.dropna(subset=['word'], inplace=True)

# if you really need to include the prefix 'sentence:' on each row
df.sentence = 'sentence:' + df.sentence.astype(str)

      sentence word tag
0   sentence:1   w1  t1
1   sentence:1   w2  t2
2   sentence:1   w3  t3
4   sentence:2   w4  t4
5   sentence:2   w5  t5
6   sentence:2   w6  t6
7   sentence:2   w7  t7
9   sentence:3   w8  t8
10  sentence:3   w9  t9

【讨论】:

    【解决方案2】:

    由于您只是在行首添加文本,因此无需将文件作为 CSV 处理。只需阅读文件并根据需要插入起始文本:

    newblock = True
    sout = ""
    i = 0
    with open('words.txt', 'rt', encoding='utf-8') as f:
         for line in f:
              if (i == 0):  
                  sout = "sentence#;" + line  # header
                  i = 1
              elif (line.strip() == ""):  # blank line
                  sout += '\n'
                  newblock = True  # next line is new sentence
              elif newblock:  # new sentence
                  sout += "sentence:" + str(i) + ";" + line  # include counter
                  i+=1
                  newblock = False  # wait for next blank line
              else:
                  sout += ";" + line  # copy existing line          
    print(sout)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-07-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-01-09
      • 2016-12-01
      • 2020-08-18
      相关资源
      最近更新 更多