Pyparsing 写入外部文件答案

【问题标题】：Pyparsing Write in External FilePyparsing 写入外部文件
【发布时间】：2021-11-18 16:28:05
【问题描述】：

更新（我得到了进一步...）

所以我的目标是为一个奇怪的 XML 类似但不是 XML 格式的脚本编写一个解析器。

<[file][][]
<[cultivation][][]
    <[string8][coordinate_system][lonlat]>
    <[list_vegetation_map_exclusion_zone][vegetation_map_exclusion_zone_list][]
    >
    <[string8][buildings_texture_folder][]>
    <[list_plant][plant_list][]
    >
    <[list_building][building_list][]
        <[building][element][0]
            <[vector3_float64][position][7.809637 46.182262 0]>
            <[float32][direction][-1.82264196872711]>
            <[float32][length][25.9434452056885]>
            <[float32][width][17.4678573608398]>
            <[int32][floors][3]>
            <[stringt8c][roof][gable]>
            <[stringt8c][usage][residential]>
        > ...

到目前为止，我得到了这个：

def toc_parser(file_path):
# save complete file in variable
f = open(file_path, "r")
toc = f.read()
parser = OneOrMore(Word(alphas))
# exclude kommis
parser.ignore('//' + pp.restOfLine())
#exclude <>
klammern = Suppress("<")
klammernzu = Suppress(">")
eckig = Suppress("[")
eckigzu = Suppress("]")
element = Suppress("[element]")
leer = Suppress("[]")


#grammar:
nameBuilding = "building"
namePosition = "position"
nameDirection = "direction"
nameLength = "length"
nameWidth = "width"
nameFloors = "floors"
nameRoof = "roof"
nameUsage = "usage"



buildingzahl = klammern + eckig + nameBuilding + eckigzu + element +eckig + Word(nums) +eckigzu
pos = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + namePosition + eckigzu + eckig + Combine(Word(nums)+"."+Word(nums))+ Combine(Word(nums)+"."+Word(nums))+ Word(nums)+ eckigzu + klammernzu
direc = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameDirection + eckigzu + eckig + Combine(Optional("-")+Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
leng = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameLength + eckigzu+eckig + Combine(Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
widt = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameWidth + eckigzu+eckig+Combine(Word(nums)+Optional("."+Word(nums)))+ eckigzu + klammernzu
floors = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameFloors + eckigzu+eckig+Word(nums)+ eckigzu + klammernzu
roof = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameRoof + eckigzu +eckig+Word(alphas)+ eckigzu + klammernzu
usag = klammern + eckig + SkipTo(Literal("]")) + eckigzu + eckig + nameUsage+ eckigzu+eckig+Word(alphas)+ eckigzu + klammernzu

building = buildingzahl + pos +direc +leng + widt + floors + roof + usag + klammernzu

file = klammern + eckig + Literal("file") + eckigzu + leer + leer + klammern + eckig+ Literal("cultivation") +eckigzu + leer + leer
vegexcl = Literal("<[list_vegetation_map_exclusion_zone][vegetation_map_exclusion_zone_list][]") + klammernzu
coordsis = Literal("<[string8][coordinate_system][lonlat]>")
textures = Literal("<[string8][buildings_texture_folder][]>")
listPlants = Literal("<[list_plant][plant_list][]") + klammernzu
listBuildings = Literal("<[list_building][building_list][]") + OneOrMore(building) + klammernzu
listLights = Literal("<[list_light][light_list][]") + klammernzu
listAirportLights = Literal("<[list_airport_light][airport_light_list][]") + klammernzu
listXref = Literal("<[list_xref][xref_list][]") + klammernzu

fileganz = file + coordsis + vegexcl + textures + listPlants + listBuildings + listLights + listAirportLights + listXref + klammernzu + klammernzu
print(fileganz.parseString(toc))

问题：

我需要能够覆盖外部脚本中的某些值并发现 (here) 这就是你的做法，但它总是输入“else”

#define Values to be updated
valuesToUpdate = {
    "building":"home"
    ""
    }

def updateSelectedDefinitions(tokens):
    if tokens.name in valuesToUpdate:
        newVal = valuesToUpdate[tokens.name]
        return "%" % tokens.name, newVal
    else:
        raise ParseException(print("no Update definded"))

非常感谢您的帮助:)

【问题讨论】：

XML 解析器通常解析通用的<tag attr=val>some content</tag> 格式，而不对实际的标签值进行硬编码。您的结构的通用框架是<[type][name][value] contents...>，其中可选内容将是相同<[type][name] etcl> 格式的递归实例。只需几行代码，在 pyparsing 中编写代码应该非常简单。然后您将遍历解析的结构以提取“buliding”或“position”或任何值。您也可以考虑让解析器转换为 JSON 或 XML，然后使用 stdlib 提取您的值。
@PaulMcG 你能详细说明我会怎么做吗？举个例子？

标签： python python-3.x list parsing pyparsing

【解决方案1】：

这里是一个快速浏览。

首先，我们应该试着用文字来描述这种格式：

“每个条目都包含在''字符中，并且在'[]'字符中包含3个值，后跟零个或多个嵌套条目。'[]'中的3个值包含数据类型，可选名称和一个或多个可选值。这些值可以是数字或字符串，并且可能会被解析为标量或列表值，具体取决于数据类型。"

将其转换为准 BNF，其中 '*' 用于“零或多个”：

entry ::= '<' subentry subentry subentry entry* '>'
subentry ::= '[' value* ']'
value ::= number | alphanumeric word

我们可以看到这是一个递归语法，因为entry 可以包含也是entry 的元素。因此，当我们转换为 pyparsing 时，我们将使用 pyparsing Forward 将 entry 定义为占位符，然后在定义所有其他表达式后定义其结构。

将这个简短的 BNF 转换为 pyparsing：

# define some basic punctuation - useful at parse time, but we will
# suppress them since we don't really need them after parsing is done
# (we'll use pyparsing Groups to capture the structure that these 
# characters represent)
LT, GT, LBRACK, RBRACK = map(pp.Suppress, "<>[]")

# define our placeholder for the nested entry
entry = pp.Forward()

# work bottom-up through the BNF
value = pp.pyparsing_common.number | pp.Word(pp.alphas, pp.alphanums+"_")
subentry = pp.Group(LBRACK - value[...] + RBRACK)
type_name_value = subentry*3
entry <<= pp.Group(LT
                   - type_name_value("type_name_value") 
                   + pp.Group(entry[...])("contents") + GT)

此时，您可以使用 entry 来解析示例文本（在添加足够多的结束 '> 使其成为有效的嵌套表达式之后）：

result = entry.parseString(sample)
result.pprint()

打印：

[[['file'],
  [],
  [],
  [[['cultivation'],
    [],
    [],
    [[['string8'], ['coordinate_system'], ['lonlat'], []],
     [['list_vegetation_map_exclusion_zone'],
      ['vegetation_map_exclusion_zone_list'],
      [],
      []],
     [['string8'], ['buildings_texture_folder'], [], []],
     [['list_plant'], ['plant_list'], [], []],
     [['list_building'],
      ['building_list'],
      [],
      [[['building'],
        ['element'],
        [0],
        [[['vector3_float64'], ['position'], [7.809637, 46.182262, 0], []],
         [['float32'], ['direction'], [-1.82264196872711], []],
         [['float32'], ['length'], [25.9434452056885], []],
         [['float32'], ['width'], [17.4678573608398], []],
         [['int32'], ['floors'], [3], []],
         [['stringt8c'], ['roof'], ['gable'], []],
         [['stringt8c'], ['usage'], ['residential'], []]]]]]]]]]]

所以这是一个开始。我们可以看到值被解析，并且值被解析为正确的类型。

要将这些片段转换为更连贯的结构，我们可以将解析操作附加到 entry，这将是每个 entry 被解析时的解析时回调。

在这种情况下，我们将编写一个解析操作来处理类型/名称/值三元组，然后捕获嵌套内容（如果存在）。我们将尝试从数据类型字符串中推断出如何构造值或内容。

def convert_entry_to_dict(tokens):
    # entry is wrapped in a Group, so ungroup to get the parsed elements
    parsed = tokens[0]

    # unpack data type, optional name and optional value
    data_type, name, value = parsed.type_name_value
    data_type = data_type[0] if data_type else None
    name = name[0] if name else None

    # save type and name in dict to be returned from the parse action
    ret = {'type': data_type, 'name': name}

    # if there were contents present, save them as the value; otherwise,
    # get the value from the third element in the triple (use the
    # parsed data type as a hint as to whether the value should be a 
    # scalar, a list, or a str)
    if parsed.contents:
        ret["value"] = list(parsed.contents)
    else:
        if data_type.startswith(("vector", "list")):
            ret["value"] = [*value]
        else:
            ret["value"] = value[0] if value else None
            if ret["value"] is None and data_type.startswith("string"):
                ret["value"] = ""

    return ret

entry.addParseAction(convert_entry_to_dict)

现在当我们解析样本时，我们得到这个结构：

[{'name': None,
  'type': 'file',
  'value': [{'name': None,
             'type': 'cultivation',
             'value': [{'name': 'coordinate_system',
                        'type': 'string8',
                        'value': 'lonlat'},
                       {'name': 'vegetation_map_exclusion_zone_list',
                        'type': 'list_vegetation_map_exclusion_zone',
                        'value': []},
                       {'name': 'buildings_texture_folder',
                        'type': 'string8',
                        'value': ''},
                       {'name': 'plant_list',
                        'type': 'list_plant',
                        'value': []},
                       {'name': 'building_list',
                        'type': 'list_building',
                        'value': [{'name': 'element',
                                   'type': 'building',
                                   'value': [{'name': 'position',
                                              'type': 'vector3_float64',
                                              'value': [7.809637,
                                                        46.182262,
                                                        0]},
                                             {'name': 'direction',
                                              'type': 'float32',
                                              'value': -1.82264196872711},
                                             {'name': 'length',
                                              'type': 'float32',
                                              'value': 25.9434452056885},
                                             {'name': 'width',
                                              'type': 'float32',
                                              'value': 17.4678573608398},
                                             {'name': 'floors',
                                              'type': 'int32',
                                              'value': 3},
                                             {'name': 'roof',
                                              'type': 'stringt8c',
                                              'value': 'gable'},
                                             {'name': 'usage',
                                              'type': 'stringt8c',
                                              'value': 'residential'}]}]}]}]}]

如果您需要重命名任何字段名称，您可以在解析操作中添加该行为。

这应该为您处理标记提供了一个良好的开端。

【讨论】：