【发布时间】:2019-04-09 05:05:01
【问题描述】:
这是我正在编写的 HTML 代码
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>sdasdsadsad</title>
<link rel="alternate" media="only screen and (max-width: 640px)" href="local:80" />
<meta name="description" content="sdddsdsdsdsdsd">
<meta name="keywords" content="3333333333333333">
<meta property="og:title" content="444444444444444444444444">
<meta property="og:type" content="article">
<meta property="og:description" content="dsdsdsdsddsds">
</head>
<body></body>
</html>
我想获取包含标签“<meta name = description”的行,它没有关闭元素</meta>。有我的代码
import glob, os, re, urllib2, codecs
from bs4 import BeautifulSoup
from bs4 import SoupStrainer
html_doc = """
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>sdasdsadsad</title>
<link rel="alternate" media="only screen and (max-width: 640px)" href="local:80" />
<meta name="description" content="sdddsdsdsdsdsd">
<meta name="keywords" content="3333333333333333">
<meta property="og:title" content="444444444444444444444444">
<meta property="og:type" content="article">
<meta property="og:description" content="dsdsdsdsddsds">
</head>
<body></body>
</html>
"""
soup = BeautifulSoup(html_doc)
aa = soup.find("meta", {"name":"description"})
print aa.encode("utf-8")
运行 Python 代码,但控制台显示
<meta content="sdddsdsdsdsdsd" name="description">
<meta content="3333333333333333" name="keywords">
<meta content="444444444444444444444444" property="og:title">
<meta content="article" property="og:type">
<meta content="dsdsdsdsddsds" property="og:description">
</meta></meta></meta></meta></meta>
但是如果“<meta content="sdddsdsdsdsdsd" name="description">”有接近元素</meta>,我可以得到准确的线:
<meta content="sdddsdsdsdsdsd" name="description"> </meta>
你想告诉我为什么 BeautifulSoup 得到所有 HTML 标签在 <meta name = description 下的原因,以及如何获取包含 <meta name = description 的行
谢谢。
【问题讨论】:
标签: python beautifulsoup