【发布时间】:2015-10-01 11:42:48
【问题描述】:
在这里遇到问题:
以下示例:
for item in g_data:
Header = item.find_all("div", {"class": "InnprodInfos"})
print(Header[0].contents[0].text.strip())
输出:
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
正如你在上面看到的,它给了我两次输出。因此,只应删除第二个重复项。
结果应该是这样的:
DMZ 3rd Tunnel - Korean Demilitarized Zone Day Tour from Seoul
Panmunjeom Day Tour
Seoul City Half Day Private Tour
The Soul of Seoul - Small Group Tour
Seoul Helicopter Tour
Seoul City Full Day Tour
Seoul City Half Day Tour
The Street Museum in the Urban Core - Small Group Tour
Korean Folk Village Day Tour
谁能给我反馈如何删除重复项?感谢您提供任何反馈。
【问题讨论】:
-
什么是
g_data?如果您删除print分配会发生什么? -
g_data的类型对于回答问题并不重要。 -
非重复行可能增长到多大?
标签: python web-crawler