【发布时间】:2021-02-19 14:16:10
【问题描述】:
我正在尝试定位脚本中包含“”@type“:“NewsArticle””的整个脚本标签。
类似:
<script type="application\/ld\+json">[^\{]*?{(.*?)\}[^\}]*?<\/script>
我可以使用上面的正则表达式来定位最上面的脚本标签。但我正在寻找一个 newsArticle JSON 信息,在这种情况下这是第二个,但在某些页面中有 4+ application/ld+json 标签,但是 " "@type": "NewsArticle" ”无论如何总是存在于每个页面中。所以我正在寻找可以针对特定脚本的脚本。
感谢您的帮助。
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Organization",
"@id": "https://www.givemesport.com/#gms",
"name": "GiveMeSport",
"url": "https://www.givemesport.com",
"logo": {
"@type": "ImageObject",
"url": "https://gmsrp.cachefly.net/v4/images/logo-gms-black.png"
},
"sameAs":[
"https://www.facebook.com/GiveMeSport",
"https://www.instagram.com/givemesport",
"https://twitter.com/GiveMeSport",
"https://www.youtube.com/user/GiveMeSport"
]
}
</script>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "NewsArticle",
"mainEntityOfPage": "https://www.givemesport.com/1612447-man-uniteds-scott-mctominay-delighted-fans-with-reaction-after-third-goal-vs-rb-leipzig",
"url": "https://www.givemesport.com/1612447-man-uniteds-scott-mctominay-delighted-fans-with-reaction-after-third-goal-vs-rb-leipzig",
"headline": "Man United's Scott McTominay delighted fans with reaction after third goal vs RB Leipzig",
"datePublished": "2020-10-30T21:52:48.3510000Z",
"dateModified": "2020-10-30T21:52:48.3510000Z",
"description": "Man United's Scott McTominay delighted fans with reaction after third goal vs RB Leipzig",
"articleSection": "Football",
"keywords": ["Football","Manchester United","Marcus Rashford","RB Leipzig","Scott McTominay","UEFA Champions"],
"creator": ["Scott Wilson"],
"thumbnailUrl": "https://gmsrp.cachefly.net/images/20/10/30/03a426c8204af5c8d02282afaeed6189/144.jpg",
"author": {
"@type": "Person",
"name": "Scott Wilson",
"sameAs": "https://www.givemesport.com/scott-wilson-1"
},
"publisher": {
"@id": "https://www.givemesport.com/#gms"
},
"image": {
"@type": "ImageObject",
"url": "https://gmsrp.cachefly.net/images/20/10/30/03a426c8204af5c8d02282afaeed6189/960.jpg",
"height": 620,
"width": 960
}
}
</script>
【问题讨论】:
-
听起来不像正则表达式是这样做的手段。它会变得缓慢和不准确。如果脚本标签内没有
<>,您可以尝试something like this。 -
使用 RegEx 解析 HTML 不是一个好习惯:stackoverflow.com/questions/1732348/… 您可以在 JS 和下一个
getElementsByTagName或 CSS 选择器中解析 HTML 字符串:stackoverflow.com/questions/10585029/… -
@mkczyk 我没有其他选择,在这种情况下我并不担心好的做法。
标签: javascript json regex