【问题标题】:Iterating over list, reusing certain values several time in csv迭代列表,在 csv 中多次重用某些值
【发布时间】:2019-06-04 04:43:57
【问题描述】:

我有一些数据看起来像这样:

description "export"
source "factory1"
source "factory2"
source "factory3"
destination "customer1"
destination "customer2"
shipdate "asap"

description "export"
source "factory4"
source "factory5"
source "factory6"
destination "customer1"
shipdate "30"

我现在正在尝试创建一个看起来像这样的 csv 文件:

description,source,destination,shipdate
export,factory1,customer1,asap
export,factory2,customer1,asap
export,factory3,customer1,asap
export,factory1,customer2,asap
export,factory2,customer2,asap
export,factory3,customer2,asap
export,factory1,customer1,asap
export,factory2,customer1,asap
export,factory3,customer1,asap
export,factory4,customer1,30
export,factory5,customer1,30
export,factory6,customer1,30

数据块作为 python 列表交给我,所以我目前正在迭代它们,然后根据第一个单词将它们放入其他列表中。但是,可能有一种更简单的处理方式。

到目前为止,我的代码看起来像这样,但正如您所见,这不会解决我的问题:

sourcelist = []
destlist = []
for item in list:
  if "source" in item:
    sourcelist.append(item)
  if "destination" in item:
    destlist.append(item)

感谢您的帮助!即使这意味着我需要重写代码!

【问题讨论】:

  • 为什么像export,factory1,customer1,asap这样的值会重复两次?
  • 您实际上是在要求解析器转换为您未指定的格式。您的数据是否这样,或者您是否需要遵循一些额外的规则?它们是什么?
  • @MarkMeyer 数据重复两次,因为它是两个不同的块,只是为了告诉你并不总是有两个目的地。
  • @MisterMiyagi 对如何处理这个问题的方向的提示也将不胜感激:) 但是,实际上并没有太多规则。目的地最多可以有两个,但不能再多了。来源可以是任何数字。描述始终存在,发货日期也是如此。奇怪,我知道,但这就是我必须解决的问题
  • @xeet 列表是什么样子的 [description "export", source "factory1", ... ,shipdate "asap"]?

标签: python python-3.x list csv


【解决方案1】:

由于每条单独的行本身是不够的,因此必须累积数据。根据您的描述和示例,您可以逐块执行此操作。只需累积一个块的所有字段 - 唯一字段将只有一个项目。

您可以使用生成器有效地解析块:

def parse_blocks(source: 'Iterable[str]'):
    block = {}
    for line in source:
        if not line:  # delimiter between blocks
            yield block
            block = {}
        else:
            key, value = line.split()
            block.setdefault(key, []).append(value.strip('"'))
    if block:
        yield block

这为您提供了一个可迭代的块,例如

{'description': ['export'], 'source': ['factory1', 'factory2', 'factory3'], 'destination': ['customer1', 'customer2'], 'shipdate': ['asap']}, ...

对于每个块,您需要跨字段的所有组合。 itertools.product 开箱即用。

import itertools

def merge_lines(blocks: 'Dict[str, List[str]]', *fields: 'str'):
    for block in blocks:
        yield from itertools.product(
            *(block[key] for key in fields)
        )

这将单个行的数据作为元组的可迭代提供:

('export', 'factory1', 'customer1', 'asap'), ('export', 'factory1', 'customer2', 'asap'), ...

您可以将其直接提供给 csv,或以您认为合适的方式处理它。

import csv
import sys

fields = 'description', 'source', 'destination', 'shipdate'

writer = csv.writer(sys.stdout)  # or write to a file, pipe, ...
writer.writerow(fields)
for data in merge_lines(parse_blocks(input_list), *fields):  # insert your input here
    writer.writerow(data)

这会产生所需的 csv 输出:

description,source,destination,shipdate
export,factory1,customer1,asap
export,factory1,customer2,asap
export,factory2,customer1,asap
export,factory2,customer2,asap
export,factory3,customer1,asap
export,factory3,customer2,asap
export,factory4,customer1,30
export,factory5,customer1,30
export,factory6,customer1,30

【讨论】:

  • 真的很有帮助!谢谢!
【解决方案2】:

给你:

import pandas as pd
import re
import itertools

# setup test data
raw_data_1 = ['description export', 'source factory 1', 'source factory 2', 'source factory 3', 'destination customer 1',
  'destination customer 2', 'shipdate asap']

raw_data_2 = ['description export', 'source factory 4', 'source factory 5', 'source factory 6', 'destination customer 1',
   'shipdate 30']

# create list of input data
data_list = [raw_data_1, raw_data_2]


# collect data from string
collected_data = []
for item in data_list:
    description = 0
    source_data = []
    destination_data = []
    ship_date = 0
    for data in item:
        if 'description' in data:
            description = re.sub('description ', '', data)
        elif 'source' in data:
            source = re.sub('source ', '', data)
            source_data.append(source)
        elif 'destination' in data:
            destination = re.sub('destination ', '', data)
            destination_data.append(destination)
        elif 'shipdate' in data:
            ship_date = re.sub('shipdate ', '', data)

    # create combinations
    combination_data = list(itertools.product(source_data, destination_data))

    # extend combinations data
    for item in combination_data:
        out = [description] + list(item) + [ship_date]
        collected_data.append(out)


# past data into data frame
data = pd.DataFrame(collected_data, columns=['description', 'source', 'destination','shipdate'])

# save data to file
data.to_csv('data.csv', index=False)

输出:

  description     source destination shipdate
0      export  factory 1  customer 1     asap
1      export  factory 1  customer 2     asap
2      export  factory 2  customer 1     asap
3      export  factory 2  customer 2     asap
4      export  factory 3  customer 1     asap
5      export  factory 3  customer 2     asap
6      export  factory 4  customer 1       30
7      export  factory 5  customer 1       30
8      export  factory 6  customer 1       30

【讨论】:

    猜你喜欢
    • 2018-05-25
    • 2013-05-23
    • 1970-01-01
    • 2023-04-04
    • 2022-11-02
    • 2021-10-02
    • 2012-03-07
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多