【问题标题】:Parsing structured text data解析结构化文本数据
【发布时间】:2015-07-26 05:28:25
【问题描述】:

我以文本格式从mysql表中提取了blob字段:

CAST(orders AS CHAR(10000) CHARACTER SET utf8)

现在每个字段如下所示:

a:2:{s:4:"Cart";a:5:{s:4:"cart";a:2:{i:398;a:7:{s:2:"id";s:3:"398";s:4:"name";s:14:"Some product 1";s:5:"price";i:780;s:3:"uid";s:5:"FN-02";s:3:"num";s:1:"1";s:6:"weight";s:1:"0";s:4:"user";s:1:"4";}i:379;a:7:{s:2:"id";s:3:"379";s:4:"name";s:14:"Some product 2";s:5:"price";i:750;s:3:"uid";s:5:"FR-01";s:3:"num";s:1:"1";s:6:"weight";s:1:"0";s:4:"user";s:1:"4";}}s:3:"num";i:2;s:3:"sum";s:7:"1530.00";s:6:"weight";i:160;s:8:"dostavka";s:3:"180";}s:6:"Person";a:17:{s:4:"ouid";s:6:"103-47";s:4:"data";s:10:"1278090513";s:4:"time";s:8:"21:33 pm";s:4:"mail";s:15:"mail@mailer.com";s:11:"name_person";s:8:"John Doe";s:8:"org_name";s:13:"John Doe Inc.";s:7:"org_inn";s:12:"667110804509";s:7:"org_kpp";s:0:"";s:8:"tel_code";s:3:"343";s:8:"tel_name";s:7:"2670039";s:8:"adr_name";s:26:"London, 221b, Baker street";s:14:"dostavka_metod";s:1:"8";s:8:"discount";s:0:"";s:7:"user_id";s:2:"13";s:6:"dos_ot";s:0:"";s:6:"dos_do";s:0:"";s:11:"order_metod";s:1:"1";}}

我可以注意到这段文字按顺序排列:[type]:[length]:[data];,其中[type]s 代表 stringa 代表 array(或 Python 中的字典)。它还有i:'number': 组,没有[length]:

虽然我不清楚如何解析嵌套字典(在 Python 术语中),但我没有看到比使用正则表达式多次解析更好的解决方案。

问题:是标准数据结构已经有解析器了吗?

【问题讨论】:

    标签: text-parsing


    【解决方案1】:

    这看起来像 PHP 序列化函数的输出(你需要反序列化它):

    http://php.net/manual/en/function.serialize.php

    如果你在 python 中工作,这里有一个序列化和反序列化函数的端口:

    https://pypi.python.org/pypi/phpserialize

    Anatomy of a serialize()'ed value:
    
    String
    s:size:value;
    
    Integer
    i:value;
    
    Boolean
    b:value; (does not store "true" or "false", does store '1' or '0')
    
    Null
    N;
    
    Array
    a:size:{key definition;value definition;(repeated per element)}
    
    Object
    O:strlen(object name):object name:object size:{s:strlen(property name):property name:property definition;(repeated per property)}
    
    String values are always in double quotes
    Array keys are always integers or strings
        "null => 'value'" equates to 's:0:"";s:5:"value";',
        "true => 'value'" equates to 'i:1;s:5:"value";',
        "false => 'value'" equates to 'i:0;s:5:"value";',
        "array(whatever the contents) => 'value'" equates to an "illegal offset type" warning because you can't use an
        array as a key; however, if you use a variable containing an array as a key, it will equate to 's:5:"Array";s:5:"value";',
         and
        attempting to use an object as a key will result in the same behavior as using an array will.
    

    【讨论】:

    • 不错!是的,我正在使用 Python,所以您的链接非常有帮助。谢谢!
    猜你喜欢
    • 2012-03-31
    • 1970-01-01
    • 1970-01-01
    • 2017-04-12
    • 1970-01-01
    • 2019-07-29
    • 1970-01-01
    • 2010-09-18
    • 2017-10-26
    相关资源
    最近更新 更多