【问题标题】:Perl regex & data extraction/manipulationPerl Regex和数据提取/操作
【发布时间】:2011-12-02 04:16:41
【问题描述】:

我不知道从哪里开始...我的客户从他的供应商那里获得库存数据,但现在它们以不同的格式发送,这里是一个示例 sn-p:

[["BLK",[["Black","0F1315"]],[["S","813"],["M","1378"],["L","1119" ],["XL","1069"],["XXL","412"],["3XL","171"]]],["BOT",[["瓶子","15451A"]] ,[["S","226"],["M","425"],["L","772"],["XL","509"],["XXL","163" ]]],["BUR",[["勃艮第","73002E"]],[["S","402"],["M","530"],["L","356" ],["XL","257"],["XXL","79"]]],["DNA",[["深海军","000F33"]],[["S","699 "],["M","1161"],["L","1645"],["XL","1032"],["XXL","350"]]],["EME", [["翡翠","0DAB5E"]],[["S","392"],["M","567"],["L","613"],["XL","431 "],["XXL","97"]]],["HEA",[["Heather","C0D4D7"]],[["S","374"],["M","447 "],["L","731"],["XL","386"],["XXL","115"],["3XL","26"]]],["KEL", [["Kelly","0FFF00"]],[["S","167"],["M","285"],["L","200"],["XL","98 "],["XXL","45"]]],["NAV",[["海军","002466"]],[["S","451"],["M","1389 "],["L","1719"],["XL","1088"],["XXL","378"],["3XL","177"]]],["NPU", [["紫色","560D55"]],[["S","347"],["M","553"],["L","691"],["XL","230 "],["XXL","101"]]],["ORA",[["橙色","FF4700"]],[["S","125"],["M","273 "],["L","158"],["XL","98"],["XXL","98"] ]],["RED",[["Red","FF002E"]],[["S","972"],["M","1186"],["L","1246"] ,["XL","889"],["XXL","184"]]],["ROY",[["皇家","1500CE"]],[["S","1078"] ,["M","1346"],["L","1102"],["XL","818"],["XXL","135"]]],["SKY",[[ "天空","91E3FF"]],[["S","567"],["M","919"],["L","879"],["XL","498"] ,["XXL","240"]]],["SUN",[["向日葵","FFC700"]],[["S","843"],["M","1409"] ,["L","1032"],["XL","560"],["XXL","53"]]],["WHI",[["White","FFFFFF"]], [["S","631"],["M","2217"],["L","1666"],["XL","847"],["XXL","410"] ,["3XL","74"]]]]

首先可以去掉开头的[和结尾的]

然后需要将其分解为颜色段,即:

["BLK",[["Black","0F1315"]],[["S","813"],["M","1378"],["L","1119"] ,["XL","1069"],["XXL","412"],["3XL","171"]]]]

这里需要BLK,下一个区块[["Black","0F1315"]]可以忽略。

接下来我需要获取每种尺寸 ["S","813"] 等的库存数据

因此我应该有一个数据,例如:

 $col = BLK
 $size = S
 $qty = 813

 $col = BLK
 $size = M
 $qty = 1278

并为数据中的每个颜色序列重复此段。

数据中颜色段的数量会有所不同,其中的尺寸段数量也会有所不同。此外,尺码段的数量会因颜色而异,即 BLK 可能有 6 个尺码,但 RED 只有 5 个尺码

这些数据将在循环中被写出,所以像print "$col:$size:$qty" 这样的东西就可以了,因为这将是一种可以处理的格式。

抱歉,信息太长了,我今天似乎无法理解这个!

问候,

学习

【问题讨论】:

    标签: regex json perl


    【解决方案1】:

    这对我来说似乎是有效的 JSON,为什么不使用 JSON parser 而不是尝试使用正则表达式来解决这个问题?

    use JSON;
    my $json_string = '[["BLK",[["Black","0F1315"]],[["S","813"...<snip>';
    my $deserialized = from_json( $json_string );
    

    然后你可以遍历数组并提取你需要的信息。

    【讨论】:

    • 谢谢蒂姆,你有没有机会告诉我如何处理这个结果,我以前没有使用过 json 的经验
    • @StuAyton:它也(可能是巧合)看起来像有效的 Perl 代码,所以如果你真的信任供应商,你可以通过 eval 提供它并查看结果是什么......
    • @IlmariKaronen 但如果我只是 eval 它那么它不是我们需要的格式
    【解决方案2】:

    Tim Pietzcker's answer 为基础:

    ...
    my $deserialized = from_json( $json_string );
    foreach my $group ( @$deserialized ) {
        my ( $color, undef, $sizes ) = @$group;
        print join( ":", $color, @$_ ), "\n" for @$sizes;
    }
    

    (是的,对于这种特定格式,eval 应该和from_json 一样好,尽管后者更安全。但是,您应该真正尝试找到该格式的官方规范:它真的是 JSON 还是还有什么?)

    【讨论】:

    • 谢谢,效果很好。与供应商核实,它肯定是 JSON,但就eval 而言,我总是选择不信任的选项,以防万一有坏事发生!
    【解决方案3】:

    假设您的数据在 $str 中,然后 eval(EXPR)(Danger Will Robinson!)并处理生成的数据结构:

    my $struct = eval $str;
    
    foreach my $cref (@$struct) {
        my($color, undef, $sizerefs) = @$cref; # 3 elements in each top level
        foreach my $sizeref (@$sizerefs) {
            my($size, $qty) = @$sizeref;
            print "$color:$size:$qty\n";
        }
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-11-30
      • 2011-02-22
      • 2013-03-16
      • 2015-06-11
      • 2018-02-10
      相关资源
      最近更新 更多