【问题标题】:How to do this Regex with Javascript如何使用 Javascript 执行此正则表达式
【发布时间】:2020-01-13 22:02:53
【问题描述】:

我正在尝试创建一个正则表达式来显示“uploadFinish”的值。我想用正则表达式来做。内容在一个巨大的 html 中:

MORE HTML
<meta property="al:android:app_name" content="I" />
<meta property="al:android:package" content="" />
<meta property="al:android:url" content="https://" />


<meta name="medium" content="image" />
<meta property="og:type" content="" />



        <script type="application/ld+json">
            {"@context":"http:\/\/schema.org","@type":"ImageObject","caption":"011 de 366\nMagali \ud83c\udf49 \n#magali #TurmadaMonica #illustration #ilustra\u00e7\u00e3o #art #drawing.","representativeOfPage":"http:\/\/schema.org\/True","uploadFinish":"2020-01-11T22:08:58","author":{"@type":"Person","alternateName":"@luis","mainEntityofPage":{"@type":"ProfilePage","@id":"https:\/\/www.example.com\/luis\/"}},"comment":[{"@type":"Comment","text":"\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e","author":{"@type":"Person","alternateName":"@katiagualtieri985","mainEntityofPage":{"@type":"ProfilePage","@id":"https:\/\/www.example.com\/katiagualtieri985\/"}}}],"commentCount":"1","contentLocation":{"@type":"Place","name":"Florian\u00f3polis, Santa Catarina","mainEntityofPage":{"@type":"CollectionPage","@id":"https:\/\/www.example.com\/explore\/locations\/213145014\/A-B-C-D\/"},"address":{"@type":"PostalAddress","addressLocality":"Florian\u00f3polis, Santa Catarina","addressCountry":{"@type":"Country","name":"BR"}}},"interactionStatistic":{"@type":"InteractionCounter","interactionType":{"@type":"LikeAction"},"userInteractionCount":"225"},"mainEntityofPage":{"@type":"ItemPage","@id":"https:\/\/www.example.com\/p\/XDFASDFSAD\/"},"description":"225 Me gusta, 1 comentarios - Lu\u00eds (@luasdf) en Example: &quot;011 de 366\nMagali \ud83c\udf49 \n#magali #TurmadaMonica #illustration #ilustra\u00e7\u00e3o #art #drawing.&quot;","name":"Lu\u00eds en example: \u201c011 de 366\nMagali \ud83c\udf49 \n#magali #TurmadaMonica #illustration #ilustra\u00e7\u00e3o #art #drawing.\u201d"}
        </script>


<link rel="alternate" href="https://www.example.com/p/B7MhI_XgtYm/" hreflang="x-default" />

MORE HTML

我试图将以下正则表达式放在一起,但没有成功:

.w+(.*)+S\w*(uploadFinish)\w*

谢谢。

【问题讨论】:

  • 你应该使用 HTML & JSON 解析器,而不是正则表达式。
  • 我正在做网页抓取,这就是我无法更改它的原因。
  • 你可以使用解析器和网页抓取
  • 但我不想为数据加载库...

标签: javascript html regex


【解决方案1】:

无论您是否进行网络抓取,您都应该将 JSON 视为 JSON。尝试使用 RegEx 将 JSON 解释为文本总是会失败。

这是一个检索您想要的 JSON 部分的示例。

const jsonLdElement = document.querySelector("[type='application/ld+json']")
const jsonLd = JSON.parse(jsonLdElement.textContent)
console.log(jsonLd.uploadFinish)
console.log(jsonLd)
<script type="application/ld+json">
  {
    "@context": "http:\/\/schema.org",
    "@type": "ImageObject",
    "caption": "011 de 366\nMagali \ud83c\udf49 \n#magali #TurmadaMonica #illustration #ilustra\u00e7\u00e3o #art #drawing.",
    "representativeOfPage": "http:\/\/schema.org\/True",
    "uploadFinish": "2020-01-11T22:08:58",
    "author": {
      "@type": "Person",
      "alternateName": "@luis",
      "mainEntityofPage": {
        "@type": "ProfilePage",
        "@id": "https:\/\/www.example.com\/luis\/"
      }
    },
    "comment": [{
      "@type": "Comment",
      "text": "\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e\ud83d\udc9e",
      "author": {
        "@type": "Person",
        "alternateName": "@katiagualtieri985",
        "mainEntityofPage": {
          "@type": "ProfilePage",
          "@id": "https:\/\/www.example.com\/katiagualtieri985\/"
        }
      }
    }],
    "commentCount": "1",
    "contentLocation": {
      "@type": "Place",
      "name": "Florian\u00f3polis, Santa Catarina",
      "mainEntityofPage": {
        "@type": "CollectionPage",
        "@id": "https:\/\/www.example.com\/explore\/locations\/213145014\/A-B-C-D\/"
      },
      "address": {
        "@type": "PostalAddress",
        "addressLocality": "Florian\u00f3polis, Santa Catarina",
        "addressCountry": {
          "@type": "Country",
          "name": "BR"
        }
      }
    },
    "interactionStatistic": {
      "@type": "InteractionCounter",
      "interactionType": {
        "@type": "LikeAction"
      },
      "userInteractionCount": "225"
    },
    "mainEntityofPage": {
      "@type": "ItemPage",
      "@id": "https:\/\/www.example.com\/p\/XDFASDFSAD\/"
    },
    "description": "225 Me gusta, 1 comentarios - Lu\u00eds (@luasdf) en Example: &quot;011 de 366\nMagali \ud83c\udf49 \n#magali #TurmadaMonica #illustration #ilustra\u00e7\u00e3o #art #drawing.&quot;",
    "name": "Lu\u00eds en example: \u201c011 de 366\nMagali \ud83c\udf49 \n#magali #TurmadaMonica #illustration #ilustra\u00e7\u00e3o #art #drawing.\u201d"
  }
</script>

【讨论】:

    【解决方案2】:

    如果您确实需要使用正则表达式提取它,您可以使用 positive lookbehind(?&lt;=PATTERN)。它匹配遵循给定模式的文本,不匹配模式本身。

    在这种情况下,模式将是"uploadFinish":"

    正向lookbehind后面应该跟一个匹配第一个引号之前的所有内容的模式,例如[^"]*

    这将允许您提取不带引号的值。

    制作整个图案:

    (?<="uploadFinish":")[^"]*
    

    【讨论】:

      【解决方案3】:

      如果您可以假设该值始终用双引号括起来,那么像这样的简单表达式将为您提供所需的值:

      html.match(/"uploadFinish":"(.*?)"/)
      

      这里html 是响应正文。 (.*?) 组匹配任何字符,但尽可能少,以便在遇到双引号时停止。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-06-01
        • 1970-01-01
        • 1970-01-01
        • 2014-07-13
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多