【问题标题】:Parse a Generated File Python解析生成的文件 Python
【发布时间】:2020-09-08 17:41:55
【问题描述】:

我正在尝试将生成的文件解析为对象列表。

不幸的是,生成的文件的结构并不总是相同,但它们包含相同的字段(以及许多其他垃圾)。

例如:

    function foo();              # Don't Care
    function maybeanotherfoo();  # Don't Care
    int maybemoregarbage;        # Don't Care

    
    product_serial = "CDE1102"; # I want this <---------------------
    unnecessary_info1 = 10;     # Don't Care
    unnecessary_info2 = "red"   # Don't Care
    product_id = 1134412;       # I want this <---------------------
    unnecessary_info3 = "88"    # Don't Care

    product_serial = "DD1232";  # I want this <---------------------
    product_id = 3345111;       # I want this <---------------------
    unnecessary_info1 = "22"    # Don't Care
    unnecessary_info2 = "panda" # Don't Care

    product_serial = "CDE1102"; # I want this <---------------------
    unnecessary_info1 = 10;     # Don't Care
    unnecessary_info2 = "red"   # Don't Care
    unnecessary_info3 = "bear"  # Don't Care
    unnecessary_info4 = 119     # Don't Care
    product_id = 1112331;       # I want this <---------------------
    unnecessary_info5 = "jj"    # Don't Care

我想要一个对象列表(每个对象都有:序列号和 ID)。

我尝试了以下方法:


import re

class Product:
    def __init__(self, id, serial):
        self.product_id = id
        self.product_serial = serial

linenum = 0
first_string = "product_serial"
second_string = "product_id"
with open('products.txt', "r") as products_file:
    for line in products_file:
        linenum += 1
        if line.find(first_string) != -1:
            product_serial = re.search('\"([^"]+)', line).group(1)
            #How do I proceed?                


任何建议将不胜感激! 谢谢!

【问题讨论】:

  • 那么你的代码是做什么的?它有效吗?有错误吗?如果有,它们是什么?
  • 我的代码可以找到第一个product_serial(CDE1102)。但是我怎样才能找到 product_id 然后从那时起继续解析呢?
  • 请从intro tour 重复on topichow to ask。 “告诉我如何解决这个编码问题”不是堆栈溢出问题。你必须做出诚实的尝试,然后然后就你的算法或技术提出一个具体的问题。 “任何建议”对于 Stack Overflow 来说过于宽泛。有许多教程向您展示如何读取文件、如何处理字符串数据等。您应该能够识别输入中的常量字符串并分隔输入行。

标签: python parsing


【解决方案1】:

我在此处使用 io.StringIO() 内联数据,但您可以将 data 替换为您的 products_file

我们的想法是我们将键/值收集到current_object,一旦我们知道我们需要单个对象(两个键)的所有数据,我们就将其推送到objects 的列表中并启动一个新的current_object

您可以使用 if line.startswith('product_serial') 之类的东西,而不是公认的复杂的正则表达式。

import io
import re

data = io.StringIO("""
    function foo();             
    function maybeanotherfoo(); 
    int maybemoregarbage;       

    
    product_serial = "CDE1102"; 
    unnecessary_info1 = 10;     
    unnecessary_info2 = "red"   
    product_id = 1134412;       
    unnecessary_info3 = "88"    

    product_serial = "DD1232";  
    product_id = 3345111;       
    unnecessary_info1 = "22"    
    unnecessary_info2 = "panda" 

    product_serial = "CDE1102"; 
    unnecessary_info1 = 10;     
    unnecessary_info2 = "red"   
    unnecessary_info3 = "bear"  
    unnecessary_info4 = 119     
    product_id = 1112331;       
    unnecessary_info5 = "jj"    
""")

objects = []

current_object = {}
for line in data:
    line = line.strip()  # Remove leading and trailing whitespace
    m = re.match(r'^(product_id|product_serial)\s*=\s*(\d+|"(?:.+?)");?$', line)

    if m:
        key, value = m.groups()
        current_object[key] = value.strip('"')
        if len(current_object) == 2:  # Got the two keys we want, ship the object
            objects.append(current_object)
            current_object = {}

print(objects)

【讨论】:

    猜你喜欢
    • 2017-10-10
    • 1970-01-01
    • 1970-01-01
    • 2011-12-23
    • 2022-01-18
    • 1970-01-01
    • 2014-10-10
    • 1970-01-01
    • 2017-11-05
    相关资源
    最近更新 更多