【问题标题】:Parsing JSON using php jsonpath使用 php jsonpath 解析 JSON
【发布时间】:2018-01-27 15:50:23
【问题描述】:

我正在尝试使用 jsonpath 解析 PHP 中的 JSON ....

我的 JSON 来自这个

https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs

(在这里剪切/粘贴太长了,但您可以在浏览器会话中看到它......)

JSON 是一个有效的 JSON(我已经使用 https://jsonlint.com/ ... 进行了验证)。

我已经使用http://www.jsonquerytool.com/ 尝试了 jsonpath 表达式,看起来一切正常,但是当我将所有内容都放在下面的 PHP 代码示例中时......

<?php  
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);

    require_once('json.php');      // JSON parser
    require_once('jsonpath-0.8.0.php');  // JSONPath evaluator

    $url = 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs';

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_PROXY, '');
    $data = curl_exec($ch);
    curl_close($ch);

    $parser = new Services_JSON(SERVICES_JSON_LOOSE_TYPE);
    $o = $parser->decode($data);

    $xpath_for_parsing = '$..aziende[?(@.descrizione=="A.S.U.I. - Trieste")]..prontoSoccorsi[?(@.descrizione=="Pronto Soccorso e Terapia Urgenza Trieste")]..dipartimenti[?(@.descrizione=="Pronto Soccorso Maggiore")]..codiciColore[?(@.descrizione=="Bianco")]..situazionePazienti..numeroPazientiInAttesa';

    $match1 = jsonPath($o, $xpath_for_parsing);
    //print_r($match1);
    $match1_encoded = $parser->encode($match1);
    print_r($match1_encoded);

    $match1_decoded = json_decode($match1_encoded);

    //print_r($match1_decoded);

    if ($match1_decoded[0] != '') {
     return  $match1_decoded[0];
    }
    else {
     return  "N.D.";
   } 
?>

... 不打印任何值 .. 只有一个“假”值。

我的 jsonpath 表达式在我的 PHP 代码中出现问题:出现的错误如下

Warning: Missing argument 3 for JsonPath::evalx(), called in /var/www/html/OpenProntoSoccorso/Test/jsonpath-0.8.0.php on line 84 and defined in /var/www/html/OpenProntoSoccorso/Test/jsonpath-0.8.0.php on line 101

Notice: Use of undefined constant descrizione - assumed 'descrizione' in /var/www/html/OpenProntoSoccorso/Test/jsonpath-0.8.0.php(104) : eval()'d code on line 1

可能我必须转义/引用我的 jsonpath 以在 PHP 中使用它,但我不知道如何......任何建议都值得赞赏......

注意:我需要使用像 ?(@.descrizione=="A.S.U.I. - Trieste") 这样的 jsonpath 表达式,我不能使用“位置”json 路径 ...

我也尝试过使用来自https://github.com/ITS-UofIowa/jsonpath/blob/master/jsonpath.php 的 jsonpath-0.8.3.php,但没有任何改变......

建议?

提前谢谢你...

【问题讨论】:

  • “出了什么问题..?” → 也许您应该详细说明您显然拥有的调试输出。或者精确的如何出错/哪个更简单的示例有效等等。(不是每个人都会下载所有依赖项只是为了调试你的代码。)
  • 我尝试使用执行我的 PHP 代码时获得的调试输出来更新我的问题.....Notice: Use of undefined constant descrizione - assumed 'descrizione' ..... 可能我必须转义/引用我的 jsonpath 以使用它PHP,但我不知道如何......任何建议表示赞赏......
  • 现在这看起来像是 jsonpath-0.8.0 库中的疏忽。真的只是看了一眼,代码有点简洁和大杂烩,eval 可能有问题(尽管 IMO 对于这个用例来说还可以)。值得注意的是,您必须特别调试库及其 eval 部分。你可以试试例如@.'descripizione'=="ASUI..." - 但我怀疑这比通知更能解决问题。 (图书馆的标记化甚至可能会扼杀这一点。)
  • 如果简单的 json-path 方法无济于事;您将不得不编写一个递归函数/foreach 组合来提取正确的属性/树。也许试试RecursiveArrayIterator 左右。

标签: php json parsing jsonpath


【解决方案1】:

我建议尝试为 JsonPath 使用不同的库,如果您正在使用的库有错误并且第 3 方服务声明查询中没有错误。

这里有几个:

我很确定还有更多。希望对您有所帮助。

【讨论】:

    【解决方案2】:

    您可以使用 json_decode 将其转换为原生 php 数组,然后您可以使用 hhb_xml_encode(来自 https://stackoverflow.com/a/43697765/1067003 )将数组转换为 xml,然后您可以使用 DOMDocument::loadHTML 将 XML 转换为 DOMDocument ,然后您可以使用 DOMXPath::query 使用 XPaths 搜索它...

    示例:

    <?php
    declare(strict_types = 1);
    header ( "content-type: text/plain;charset=utf8" );
    require_once ('hhb_.inc.php');
    $json_raw = (new hhb_curl ( '', true ))->exec ( 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs' )->getStdOut ();
    $parsed = json_decode ( $json_raw, true );
    // var_dump ( $parsed );
    $xml = hhb_xml_encode ( $parsed );
    // var_dump($xml);
    $dom = @DOMDocument::loadHTML ( $xml );
    $dom->formatOutput = true;
    $xp = new DOMXPath ( $dom );
    $elements_for_parsing = $xp->query ( '//aziende/descrizione[text()=' . xpath_quote ( 'A.S.U.I. - Trieste' ) . ']|//prontosoccorsi/descrizione[text()=' . xpath_quote ( 'Pronto Soccorso e Terapia Urgenza Trieste' ) . ']|//dipartimenti/descrizione[text()=' . xpath_quote ( 'Pronto Soccorso Maggiore' ) . ']|//codicicolore/descrizione[text()=' . xpath_quote ( 'Bianco' ) . ']|//situazionepazienti|//numeroPazientiInAttesa' );
    // var_dump ( $elements_for_parsing,$dom->saveXML() );
    foreach ( $elements_for_parsing as $ele ) {
        var_dump ( $ele->textContent );
    }
    
    // based on https://stackoverflow.com/a/1352556/1067003
    function xpath_quote(string $value): string {
        if (false === strpos ( $value, '"' )) {
            return '"' . $value . '"';
        }
        if (false === strpos ( $value, '\'' )) {
            return '\'' . $value . '\'';
        }
        // if the value contains both single and double quotes, construct an
        // expression that concatenates all non-double-quote substrings with
        // the quotes, e.g.:
        //
        // concat("'foo'", '"', "bar")
        $sb = 'concat(';
        $substrings = explode ( '"', $value );
        for($i = 0; $i < count ( $substrings ); ++ $i) {
            $needComma = ($i > 0);
            if ($substrings [$i] !== '') {
                if ($i > 0) {
                    $sb .= ', ';
                }
                $sb .= '"' . $substrings [$i] . '"';
                $needComma = true;
            }
            if ($i < (count ( $substrings ) - 1)) {
                if ($needComma) {
                    $sb .= ', ';
                }
                $sb .= "'\"'";
            }
        }
        $sb .= ')';
        return $sb;
    }
    function hhb_xml_encode(array $arr, string $name_for_numeric_keys = 'val'): string {
        if (empty ( $arr )) {
            // avoid having a special case for <root/> and <root></root> i guess
            return '';
        }
        $is_iterable_compat = function ($v): bool {
            // php 7.0 compat for php7.1+'s is_itrable
            return is_array ( $v ) || ($v instanceof \Traversable);
        };
        $isAssoc = function (array $arr): bool {
            // thanks to Mark Amery for this
            if (array () === $arr)
                return false;
            return array_keys ( $arr ) !== range ( 0, count ( $arr ) - 1 );
        };
        $endsWith = function (string $haystack, string $needle): bool {
            // thanks to MrHus
            $length = strlen ( $needle );
            if ($length == 0) {
                return true;
            }
            return (substr ( $haystack, - $length ) === $needle);
        };
        $formatXML = function (string $xml) use ($endsWith): string {
            // there seems to be a bug with formatOutput on DOMDocuments that have used importNode with $deep=true
            // on PHP 7.0.15...
            $domd = new DOMDocument ( '1.0', 'UTF-8' );
            $domd->preserveWhiteSpace = false;
            $domd->formatOutput = true;
            $domd->loadXML ( '<root>' . $xml . '</root>' );
            $ret = trim ( $domd->saveXML ( $domd->getElementsByTagName ( "root" )->item ( 0 ) ) );
            assert ( 0 === strpos ( $ret, '<root>' ) );
            assert ( $endsWith ( $ret, '</root>' ) );
            $full = trim ( substr ( $ret, strlen ( '<root>' ), - strlen ( '</root>' ) ) );
            $ret = '';
            // ... seems each line except the first line starts with 2 ugly spaces,
            // presumably its the <root> element that starts with no spaces at all.
            foreach ( explode ( "\n", $full ) as $line ) {
                if (substr ( $line, 0, 2 ) === '  ') {
                    $ret .= substr ( $line, 2 ) . "\n";
                } else {
                    $ret .= $line . "\n";
                }
            }
            $ret = trim ( $ret );
            return $ret;
        };
    
        // $arr = new RecursiveArrayIterator ( $arr );
        // $iterator = new RecursiveIteratorIterator ( $arr, RecursiveIteratorIterator::SELF_FIRST );
        $iterator = $arr;
        $domd = new DOMDocument ();
        $root = $domd->createElement ( 'root' );
        foreach ( $iterator as $key => $val ) {
            // var_dump ( $key, $val );
            $ele = $domd->createElement ( is_int ( $key ) ? $name_for_numeric_keys : $key );
            if (! empty ( $val ) || $val === '0') {
                if ($is_iterable_compat ( $val )) {
                    $asoc = $isAssoc ( $val );
                    $tmp = hhb_xml_encode ( $val, is_int ( $key ) ? $name_for_numeric_keys : $key );
                    // var_dump ( $tmp );
                    // die ();
                    $tmp = @DOMDocument::loadXML ( '<root>' . $tmp . '</root>' );
                    foreach ( $tmp->getElementsByTagName ( "root" )->item ( 0 )->childNodes ?? [ ] as $tmp2 ) {
                        $tmp3 = $domd->importNode ( $tmp2, true );
                        if ($asoc) {
                            $ele->appendChild ( $tmp3 );
                        } else {
                            $root->appendChild ( $tmp3 );
                        }
                    }
                    unset ( $tmp, $tmp2, $tmp3 );
                    if (! $asoc) {
                        // echo 'REMOVING';die();
                        // $ele->parentNode->removeChild($ele);
                        continue;
                    }
                } else {
                    $ele->textContent = $val;
                }
            }
            $root->appendChild ( $ele );
        }
        $domd->preserveWhiteSpace = false;
        $domd->formatOutput = true;
        $ret = trim ( $domd->saveXML ( $root ) );
        assert ( 0 === strpos ( $ret, '<root>' ) );
        assert ( $endsWith ( $ret, '</root>' ) );
        $ret = trim ( substr ( $ret, strlen ( '<root>' ), - strlen ( '</root>' ) ) );
        // seems to be a bug with formatOutput on DOMDocuments that have used importNode with $deep=true..
        $ret = $formatXML ( $ret );
        return $ret;
    }
    

    ps,require_once ('hhb_.inc.php'); $json_raw = (new hhb_curl ( '', true ))->exec ( 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs' )->getStdOut (); 行只是获取 url 并将 json 放入 $json_raw (使用 gzip 压缩传输以加快速度),将其替换为您想要将其提取到 $json_raw 的任何内容,实际 curl我使用的库来自https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php#L477

    目前打印:

    string(18) "A.S.U.I. - Trieste"
    string(41) "Pronto Soccorso e Terapia Urgenza Trieste"
    string(9) "121200:14"
    string(10) "181400:254"
    string(6) "Bianco"
    string(7) "200:292"
    string(5) "00:00"
    string(24) "Pronto Soccorso Maggiore"
    string(7) "3300:15"
    string(6) "Bianco"
    string(8) "6200:584"
    string(5) "00:00"
    string(5) "00:00"
    string(8) "4100:353"
    string(6) "Bianco"
    string(7) "100:051"
    string(5) "00:00"
    string(5) "00:00"
    string(7) "1100:15"
    string(8) "6402:012"
    string(6) "Bianco"
    string(7) "402:274"
    string(5) "00:00"
    string(9) "11900:202"
    string(9) "11401:427"
    string(6) "Bianco"
    string(8) "2102:051"
    string(5) "00:00"
    string(7) "3300:08"
    string(8) "7401:423"
    string(6) "Bianco"
    string(8) "8402:104"
    string(5) "00:00"
    string(6) "Bianco"
    string(5) "00:00"
    string(5) "00:00"
    string(5) "00:00"
    string(5) "00:00"
    string(7) "1100:04"
    string(10) "121000:512"
    string(6) "Bianco"
    string(8) "5400:461"
    string(5) "00:00"
    string(5) "00:00"
    string(5) "00:00"
    string(6) "Bianco"
    string(5) "00:00"
    string(5) "00:00"
    string(9) "121200:18"
    string(9) "11800:593"
    string(6) "Bianco"
    string(8) "6401:272"
    string(5) "00:00"
    string(6) "Bianco"
    string(7) "1100:04"
    string(5) "00:00"
    string(5) "00:00"
    string(5) "00:00"
    string(7) "2200:05"
    string(9) "10801:102"
    string(6) "Bianco"
    string(8) "8201:166"
    string(5) "00:00"
    string(8) "3200:071"
    string(7) "100:261"
    string(6) "Bianco"
    string(5) "00:00"
    string(5) "00:00"
    string(7) "1100:00"
    string(9) "151500:26"
    string(10) "161301:123"
    string(6) "Bianco"
    string(8) "9500:434"
    string(7) "1100:00"
    string(7) "2200:13"
    string(6) "Bianco"
    string(7) "200:342"
    string(5) "00:00"
    string(6) "Bianco"
    string(7) "1100:24"
    string(5) "00:00"
    string(5) "00:00"
    string(5) "00:00"
    string(7) "1100:04"
    string(8) "9700:222"
    string(10) "171500:582"
    string(6) "Bianco"
    string(7) "200:512"
    string(7) "1100:40"
    string(7) "1100:22"
    string(6) "Bianco"
    string(8) "3100:062"
    string(5) "00:00"
    string(5) "00:00"
    string(5) "00:00"
    string(6) "Bianco"
    string(5) "00:00"
    string(5) "00:00"
    string(7) "1100:22"
    string(8) "7500:302"
    string(6) "Bianco"
    string(5) "00:00"
    string(5) "00:00"
    string(7) "1100:06"
    string(6) "Bianco"
    string(7) "1100:00"
    string(5) "00:00"
    string(5) "00:00"
    

    希望这就是你要找的东西,我是根据你提供的“xpath”来猜测的。

    【讨论】:

      【解决方案3】:

      xpath 对你的任务来说太复杂了,而且一般来说太过分了......

      只需使用标准的json_decode(),获取等效的 PHP 对象并使用标准的 for/while 循环和正则表达式对其进行导航

      另外我认为您的问题具有误导性,您的问题不是解析 JSON(由 json_decode() 自动完成),您的问题是使用 xpath 从中提取一些数据。我建议重构您的问题究竟出了什么问题,你的意图是什么

      如果您需要下降到一个精确的 JSON 节点(或一组节点),为什么不通过 for 循环和正则表达式来实现呢?

      【讨论】:

      • 是的,我的问题是关于使用 xpath 从我的 JSON 中提取一些数据......我开始在 PHP 中使用 jsonpath 并且它工作正常,但现在我必须使用更复杂的表达式并且一些麻烦来了out ... 如果可以的话,我不想重写/重构我的业务。正如您所看到的,我要解析的json很长,我不仅要解析那个,还要解析其他可能更大的json,jsonpath方法可以帮助我....无论如何我可以评估为尝试使用从 json_decode() 开始的标准方法
      • json_decode() 给你一个 php 对象,一旦你有了它,你应该没有任何问题遍历字段并获得你需要的东西。我真的不认为需要 xpath...
      • @GianlucaGhettini xpaths 可以轻松搜索整个文档,而无需知道(甚至不关心)其确切结构。比如,您知道某处是 foobar ,其属性名为 baz 并且文本包含 lal 吗?但是您不确定 foobar 元素到底在哪里? //foobar[@baz and contains(text(),"lal")] - 在原生 php 数组上做同样的事情会相当困难,不过我欢迎你证明我错了。
      • 正确,但您过度工程化。您是否注意到 OP 正在尝试做什么?很简单的遍历,json结构是先验已知的
      【解决方案4】:

      我已经解决了更改 JsonPath 库实现的问题:现在我使用 Skyscanner JsonPath 实现(参考。https://github.com/Skyscanner/JsonPath-PHP)。

      安装时遇到一些麻烦(对我来说,我之前从未使用过 composer ...),但 skyskanner 团队支持我(参考https://github.com/Skyscanner/JsonPath-PHP/issues/6),现在我有了这个 PHP 代码 ....

      <?php
          ini_set('display_errors', 'On');
          error_reporting(E_ALL);
      
          include "./tmp/vendor/autoload.php";
      
          $url = 'https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs';
      
          //#Set CURL parameters: pay attention to the PROXY config !!!!
          $ch = curl_init();
          curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
          curl_setopt($ch, CURLOPT_HEADER, 0);
          curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
          curl_setopt($ch, CURLOPT_URL, $url);
          curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
          curl_setopt($ch, CURLOPT_PROXY, '');
          $data = curl_exec($ch);
          curl_close($ch);
      
          $jsonObject = new JsonPath\JsonObject($data);
      
          $jsonPathExpr = "$..aziende[?(@.descrizione==\"A.S.U.I. - Trieste\")]..prontoSoccorsi[?(@.descrizione==\"Pronto Soccorso e Terapia Urgenza Trieste\")]..dipartimenti[?(@.descrizione==\"Pronto Soccorso Maggiore\")]..codiciColore[?(@.descrizione==\"Verde\")]..situazionePazienti..numeroPazientiInAttesa";
      
          $r = $jsonObject->get($jsonPathExpr);
      
          //print json_encode($r);
      
          print json_encode($r[0]);
      ?>
      

      ./tmp我有我从作曲家那里得到的东西

      ... 工作正常,通过这种方式,我可以在不知道其确切结构的情况下进行我的 json 查询

      【讨论】:

        【解决方案5】:

        <?php // PRINT SI JSON ORIGINAL
        define("DIRPATH", dirname($_SERVER["SCRIPT_FILENAME"]) . '/');
        define("WEBPATH", 'http://' . $_SERVER['SERVER_ADDR'] . dirname($_SERVER['PHP_SELF']) . '/');
        //define("WEBPORT", 'http://' . $_SERVER['SERVER_ADDR'] . ':' . $_SERVER['SERVER_PORT'] . dirname($_SERVER['PHP_SELF']) . '/');
        //define("imgpath", DIRPATH . 'image/');
        //$png = file_get_contents('iptv.kodi.al/images/');
        $jsondata = file_get_contents('https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs');
        header("Content-type: application/ld+json; charset=utf-8");
        	$print = json_decode($jsondata);
        	print_r($print);
        ?>
        
        <?php // PRINT ME KATEGORI
        define("DIRPATH", dirname($_SERVER["SCRIPT_FILENAME"]) . '/');
        define("WEBPATH", 'http://' . $_SERVER['SERVER_ADDR'] . dirname($_SERVER['PHP_SELF']) . '/');
        //define("WEBPORT", 'http://' . $_SERVER['SERVER_ADDR'] . ':' . $_SERVER['SERVER_PORT'] . dirname($_SERVER['PHP_SELF']) . '/');
        //define("imgpath", DIRPATH . 'image/');
        //$png = file_get_contents('iptv.kodi.al/images/');
        $jsondata = file_get_contents('https://servizionline.sanita.fvg.it/tempiAttesaService/tempiAttesaPs');
        header("Content-type: application/ld+json; charset=utf-8");
        	$print = json_decode($jsondata);
        	//print_r($print);
        	$items = '';
        	// KETU FILLON LISTA
        	foreach ($print->{'aziende'} as $item) {
        	$items .= '
        ' . $item->id . '
        ' . $item->descrizione . '
        ';
        };
        ?>
        <?php echo $items; ?>

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2017-02-05
          • 1970-01-01
          • 2022-01-03
          • 2021-09-18
          • 1970-01-01
          • 2023-02-23
          • 1970-01-01
          • 2022-01-02
          相关资源
          最近更新 更多