【问题标题】:selenium , how to print elements of this html in order as they are?selenium ,如何按原样打印此 html 的元素?
【发布时间】:2021-10-21 21:39:57
【问题描述】:

如果这是 whatsapp 消息的 html(“???? 你怎么 ???? 你 ????”)那么如何遍历该消息的元素并按顺序获取它们(打印它们)他们是硒?

   <span dir="ltr" class="i0jNr selectable-text copyable-text">
    <span>
        <img crossorigin="anonymous"
            src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" alt="????"
            draggable="false" class="b75 emoji wa i0jNr selectable-text copyable-text" data-plain-text="????"
            style="background-position: -60px -40px;">
        " how "
        <img crossorigin="anonymous"
            src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" alt="????"
            draggable="false" class="b60 emoji wa i0jNr selectable-text copyable-text" data-plain-text="????"
            style="background-position: -60px -40px;">
        " are you"
        <img crossorigin="anonymous"
            src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" alt="????"
            draggable="false" class="b25 emoji wa i0jNr selectable-text copyable-text" data-plain-text="????"
            style="background-position: -40px -40px;">
    </span>
</span>

输出应该是

????
 how
????
 are you
????

或者输出也可以是这样的

???? how ???? are you ????

我试过了

chats = driver.find_elements_by_class_name("message-in")
for i in range(0,len(chats)):
    messages = chats[i].find_elements_by_class_name("i0jNr")
    for j in range(0,len(messages)):
        if messages[j].text == "" :        
            emojis = chats[i].find_elements_by_class_name("emoji")
            for emoji in emojis:
                print(emoji.get_attribute('alt'))
                break
        else:
            print(messages[j].text)

这是给出的输出

 how
 are you
????
????
???? 

那么如何按原样获取 this 的元素呢?

【问题讨论】:

    标签: javascript python selenium selenium-webdriver automation


    【解决方案1】:

    您可以遍历span 元素的子元素,并在字符串的情况下打印文本,在img 标记的情况下打印替代文本

    from bs4 import BeautifulSoup as bs4
    from bs4 import NavigableString, Tag
    
    soup = bs4(html, 'html.parser')
    
    s = soup.find('span', attrs={'class':'i0jNr'})
    s = s.find('span')
    for i in s.children:
        if isinstance(i, NavigableString):
            print(i.strip())
        elif isinstance(i, Tag):
            print(i.attrs['alt'])
    

    这里是您的用例的代码示例 它的输出是 this message

    ?
    how
    ?
    are you
    ?
    

    【讨论】:

    • 其实我对bs4了解不多,如何结合selenium和bs4? .这是完整的代码from bs4 import BeautifulSoup as bs4 from bs4 import NavigableString, Tag from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions
    • driver = webdriver.Chrome(r'C:\Users\PRANAV PATIL\Downloads\chromedriver.exe') driver.get(r'https://web.whatsapp.com/') input("enter any key :") searchbox = WebDriverWait(driver, 10).until(expected_conditions.presence_of_element_located((By.XPATH, "//div[@id='side']//div//div//label//div//div[@contenteditable='true']"))) searchbox.send_keys('diksha') # enter your sender's name searchbox.send_keys(Keys.RETURN) input("enter any key :")
    • soup = bs4(html, 'html.parser') s = soup.find('span', attrs={'class':'i0jNr'}) s = s.find('span') for i in s.children: if isinstance(i, NavigableString): print(i.strip()) elif isinstance(i, Tag): print(i.attrs['alt'])
    • 这是整个代码@Gr8ayu,但它在这个地方给出错误,实际上我不知道bs4它在html字soup = bs4(html, 'html.parser')给出错误@该html字怎么办?错误是 html 未定义 @Gr8ayu
    • 你需要用你的html代码初始化html,你可以像html = driver.page_source一样使用
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2011-02-18
    • 2023-03-06
    • 1970-01-01
    • 2021-03-12
    • 1970-01-01
    相关资源
    最近更新 更多