【发布时间】:2022-01-13 12:15:34
【问题描述】:
我想通过网络抓取从cimri.com 获取我的项目的数据,并且我尝试详细了解手机的技术属性,但是当我想获得具体的技术属性时,比如说,处理器型号和内存大小。看起来像如您所附截图所示,所有技术属性都具有相同的跨度类值。
当我执行以下代码时;
def getAndParseURL(url):
result = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = bts(result.text, 'html.parser')
return soup
html = getAndParseURL("https://www.cimri.com/cep-telefonlari/en-ucuz-oppo-a74-128gb-4gb-ram-6- 43-inc-48mp-akilli-cep-telefonu-siyah-fiyatlari,775993409")
for i in html.findAll("div",{"class":"s10v53f3-0 bfgzQt"}) :
for b in i.findAll("ul",{"class":"s10v53f3-2 goYFek"}) :
for c in b.findAll("li",{"class":"s10v53f3-4 rKbMg"}) :
for d in c.findAll("span",{"class":"s10v53f3-6 geozbR"}) :
print(d)
它为我提供了如下所有技术属性;
<span class="s10v53f3-6 geozbR">6.43 inç</span>
<span class="s10v53f3-6 geozbR">AMOLED</span>
<span class="s10v53f3-6 geozbR">FHD+</span>
<span class="s10v53f3-6 geozbR">1080x2400 Piksel</span>
<span class="s10v53f3-6 geozbR">84.4 %</span>
<span class="s10v53f3-6 geozbR">409 PPI</span>
<span class="s10v53f3-6 geozbR">Kapasitif Ekran</span>
<span class="s10v53f3-6 geozbR">800</span>
<span class="s10v53f3-6 geozbR">1000000:1</span>
<span class="s10v53f3-6 geozbR">Qualcomm SM6115 Snapdragon 662</span>
<span class="s10v53f3-6 geozbR">2.0 GHz</span>
<span class="s10v53f3-6 geozbR">Adreno 610</span>
<span class="s10v53f3-6 geozbR">4 GB RAM</span>
<span class="s10v53f3-6 geozbR">Android 11</span>
<span class="s10v53f3-6 geozbR">Android</span>
<span class="s10v53f3-6 geozbR">8 Çekirdek</span>
<span class="s10v53f3-6 geozbR">11 nm</span>
<span class="s10v53f3-6 geozbR">64 bit</span>
<span class="s10v53f3-6 geozbR">950 MHz</span>
<span class="s10v53f3-6 geozbR">LPDDR4x</span>
<span class="s10v53f3-6 geozbR">Çift Kanal</span>
<span class="s10v53f3-6 geozbR">48 MP</span>
<span class="s10v53f3-6 geozbR">F2.4</span>
<span class="s10v53f3-6 geozbR">16 MP</span>
<span class="s10v53f3-6 geozbR">F1.7</span>
<span class="s10v53f3-6 geozbR">F2.4</span>
<span class="s10v53f3-6 geozbR">1080p (Full HD)</span>
<span class="s10v53f3-6 geozbR">30 FPS</span>
<span class="s10v53f3-6 geozbR">2 MP</span>
<span class="s10v53f3-6 geozbR">LED</span>
<span class="s10v53f3-6 geozbR">73.8 mm</span>
<span class="s10v53f3-6 geozbR">160.3 mm</span>
<span class="s10v53f3-6 geozbR">8 mm</span>
<span class="s10v53f3-6 geozbR">175 gr</span>
<span class="s10v53f3-6 geozbR">Siyah</span>
<span class="s10v53f3-6 geozbR">USB Type-C</span>
<span class="s10v53f3-6 geozbR">Li-Po</span>
<span class="s10v53f3-6 geozbR">5000 mAh</span>
<span class="s10v53f3-6 geozbR">128 GB</span>
<span class="s10v53f3-6 geozbR">5.0</span>
<span class="s10v53f3-6 geozbR">3.5 mm</span>
<span class="s10v53f3-6 geozbR">Wi-Fi 5</span>
<span class="s10v53f3-6 geozbR">42.2 Mbps</span>
<span class="s10v53f3-6 geozbR">5.76 Mbps</span>
<span class="s10v53f3-6 geozbR">2021</span>
<span class="s10v53f3-6 geozbR">Ekran İçinde</span>
<span class="s10v53f3-6 geozbR">Nano-SIM (4FF)</span>
<span class="s10v53f3-6 geozbR">30</span>
<span class="s10v53f3-6 geozbR">1080p</span>
我已经将所有功能都视为 dict,但是当我查看所有手机的品牌和型号时,每个品牌和每个型号都有不同数量的功能,要创建数据框,每个品牌和每个型号都必须具有相同的列,所以我已决定在数据框中获取其中一些功能。
【问题讨论】:
-
也许你应该在
for-loops 中对数据进行分组——即将所有值分组在for b的一个循环中或for c的一个循环中——然后使用索引从中获取一个值组。 -
更好地显示带有真实 URL 的最小工作代码,因此我们可以简单地复制并运行它。并展示你想要得到的东西。
-
或者你应该创建更复杂的代码 - 获取所有
li然后检查它是否有span- 如果它没有跨度那么你有header/title可以用作key在dictionary中保留下一个li和span作为该组中的值 - 或者您可以使用此title来识别区域。
标签: python pandas selenium web-scraping beautifulsoup