【问题标题】:Trying make a loop with Selenium in Python尝试在 Python 中使用 Selenium 进行循环
【发布时间】:2021-05-26 14:22:36
【问题描述】:

我有一个代码要在这个站点中搜索 --> https://osu.ppy.sh/beatmapsets?m=0 只映射我想要的难度,但我无法正确循环

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep

# Set link and path
driver = webdriver.Chrome(executable_path=r"C:\Users\Gabri\anaconda3\chromedriver.exe")
driver.get("https://osu.ppy.sh/beatmapsets?m=0")
wait = WebDriverWait(driver, 20)

# Variables, lists and accountants
lista = {}
links, difficulty, maps2, final = [], [], [], []
line, column, = 1, 1
link_test = ''

n = int(input('insert how many maps do you want: '))
c = 1

# Open link in Chrome and search map by map
while True:
    if c > n:
        break
    sleep(1)
    wait.until(EC.element_to_be_clickable(
        (By.CSS_SELECTOR, f".beatmapsets__items-row:nth-of-type(1)>.beatmapsets__item:nth-of-type(1)")))
    games = driver.find_element_by_css_selector(
        f".beatmapsets__items-row:nth-of-type({line}) .beatmapsets__item:nth-of-type({column}) .beatmapset-panel__info-row--extra")
    actions = ActionChains(driver)
    actions.move_to_element(games).perform()
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".beatmaps-popup__group")))
    scores = driver.find_elements_by_css_selector(
        ".beatmaps-popup__group .beatmaps-popup-item__col.beatmaps-popup-item__col--difficulty")

    # This part i can't makes automatic, for example, if i wanted to show 6 maps i would have to add 2 more if's
    # Changing the variable (line) and (column) accordingly

    # I liked to have a loop with 'while' or 'for ... in' but i don't know how make it
    # I tried to do a question before start the code like 'how many maps do you want?' and this number would be the times that code would execute
    # But no work it =(

    if c % 2 != 0:
        column = 2
        if c % 2 == 0:
            line += 1
    else:
        line += 1
        column = 1

        # Convert string to float (difficulty numbers)
    for score in scores:
        a = score.text
        b = a.replace(',', '.')
        difficulty.append(float(b))

    # Save in list 'links' each link corresponding of map that is printing
    games.click()
    sleep(3)
    link_test = driver.current_url
    links.append(link_test)
    link_test = ''
    driver.back()

    # Dict with map, link and difficulty
    lista = {
        'map': f"{c}",
        'link': f"{links}",
        'difficulty': f"{difficulty}"}
    c += 1
    # Print each map in dict 'lista'
    print(f"Map: {lista['map']}\nLink: {links}\nDifficulty: {lista['difficulty']}\n")

    # This part is my filter, if map have difficulty 6.00 or more, it's add to list 'final' for download
    for b in difficulty:
        if b >= 6.00:
            # This slice, the link had printing error 'TypeError: unhashable type: 'list'', i found this way to solve it
            # I know that is not the best way to solve this error, but at least i tried =,)
            xam = str(links[0])
            xam1 = xam.replace("'", '')
            xam2 = xam1.replace("[", '')
            xam3 = xam2.replace("]", '')
            final.append(xam3)

    # Clean all lists for no have duplicate items in dict 'lista' when next map is selected
    difficulty.clear()
    lista.clear()
    links.clear()

# Print how many maps with difficulty 6.00 has been found
print(f'There are {len(sorted(set(final)))} maps to download')

# This question is for future download, im still coding this part, so u can ignore this =3
pergunta = input('Do you want to download them? \n[ Y ]\n[ N ]\n>>> ').lower().strip()

# Clean duplicate links and show all links already filtered
if pergunta == 'y':
    for x in final:
        maps2.append(x)
    print(sorted(set(maps2)))

在“如果”部分,我需要帮助以使其自动化,就像我所做的那样,对许多“如果”没有用处。使用带有'v += n'的变量可能?身份证;-;

PS-如果您发现任何逻辑错误或优化我的代码的方法,我将很乐意学习并修复它

【问题讨论】:

  • 我已经看到这个帖子至少 3 次了,最近 2 次尝试中什么对你不起作用?
  • @cruisepandey 最近 2 次我没有尝试对这部分做任何事情,因为我将注意力集中在解决其他问题上。我之前一直在等待有人帮助我,但今天我试图解决这个问题问题,此时我尝试用 += 制作一个简单的自会计变量,如果我向前迈出一步,我将编辑代码来解释我做了什么
  • 随着maps_quantity 数量的增加,linecolumn 是否有任何模式?从上面的代码 sn-p 看来是随机的
  • @JD2775 是的,就像一个坐标,第1行第1列是1°图,第1行第2列是2°图,第2行第1列是3°图...我按照布局页面的,有两列和几行
  • @JD2775 看这个例子 --> imgur.com/a/NtbBxXL

标签: python selenium selenium-webdriver


【解决方案1】:

你做的工作比你必须做的要多。当您在浏览器中访问该页面并记录您的网络流量时,每次向下滚动以加载更多谱面图时,您都会看到一些 XHR (XmlHttpRequest) HTTP GET 请求正在向 REST API 发出,其响应是 JSON 并包含您可能想要的所有谱面信息。您需要做的就是模仿 HTTP GET 请求 - 不需要 Selenium:

def get_beatmaps():
    import requests

    url = "https://osu.ppy.sh/beatmapsets/search"

    params = {
        "m": "0",
        "cursor[approved_date]": "0",
        "cursor[_id]": "0"
    }

    while True:
        response = requests.get(url)
        response.raise_for_status()

        data = response.json()

        cursor_id = data["cursor"]["_id"]
        if cursor_id == params["cursor[_id]"]:
            break
        
        yield from data["beatmapsets"]
        params["cursor[approved_date]"] = data["cursor"]["approved_date"]
        params["cursor[_id]"] = cursor_id


def main():
    from itertools import islice

    num_beatmaps = 10 # Get info for first ten beatmaps

    beatmaps = list(islice(get_beatmaps(), num_beatmaps))

    for beatmap in beatmaps:
        print("{} - {}".format(beatmap["artist"], beatmap["title"]))
        for version in beatmap["beatmaps"]:
            print("    [{}]: {}".format(version["version"], version["difficulty_rating"]))
        print()

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

Aitsuki Nakuru - Monochrome Butterfly
    [Gibune's Insane]: 4.55
    [Excitement]: 5.89
    [Collab Extra]: 5.5
    [Hard]: 3.54
    [Normal]: 2.38

Sweet Trip - Chocolate Matter
    [drops an end to all this disorder]: 4.15
    [spoken & serafeim's hard]: 3.12

Aso Natsuko - More-more LOVERS!!
    [SS!]: 5.75
    [Sonnyc's Expert]: 5.56
    [milr_'s Hard]: 3.56
    [Dailycare's Insane]: 4.82

Takayan - Jinrui Mina Menhera
    [Affection]: 4.43
    [Normal]: 2.22
    [Narrative's Hard]: 3.28

Asaka - Seize The Day (TV Size)
    [Beautiful Scenery]: 3.7
    [Kantan]: 1.44
    [Seren's Oni]: 3.16
    [XK's Futsuu]: 2.01
    [ILOVEMARISA's Muzukashii]: 2.71
    [Xavy's Seize The Moment]: 4.06

Swimy - Acchi Muite (TV Size)
    [Look That Way]: 4.91
    [Azu's Cup]: 1.72
    [Platter]: 2.88
    [Salad]: 2.16
    [Sya's Rain]: 4.03

Nakazawa Minori (CV: Hanazawa Kana) - Minori no Zokkon Mirai Yohou (TV Size)
    [Expert]: 5.49
    [Normal]: 2.34
    [Suou's Hard]: 3.23
    [Suou's Insane]: 4.38
    [Another]: 4.56

JIN - Children Record (Re:boot)
    [Collab Hard]: 3.89
    [Maki's Normal]: 2.6
    [hypercyte & Seto's Insane]: 5.01
    [Kagerou]: 6.16

Coalamode. - Nemophila (TV Size)
    [The Hidden Dungeon Only I Can Enter]: 3.85
    [Silent's Hard]: 3
    [Normal]: 2.29

MISATO - Necro Fantasia
    [Lunatic]: 6.06

>>>

这个例子现在的写法是,它从 API 中获取前十个谱面图,打印出艺术家和标题,以及该谱面图每个版本的名称和难度。您可以根据需要进行更改,并根据难度过滤输出。

话虽如此,我对 OSU 或谱面图一无所知。如果你能描述最终输出的实际样子,我可以定制我的解决方案。

【讨论】:

  • 哇,我尝试了一些完全不同的东西,但我非常喜欢你的代码哈哈。在最终代码中(当我解决所有问题时)我将使用列表“maps2”中的链接进行下载,仅使用难度为 6.00 或更高的地图。我必须有一段像你这样的代码,但我这样做是因为我试图尽可能多地清理以使其更容易阅读;)。我觉得你不必更改你的代码,因为我已经知道如何制作过滤器,但如果你愿意,我总是感谢帮助 =)
  • 我延迟回答因为我不喜欢只在我的代码中使用 ctrl+v,ctrl+c 而不了解每一行,我更喜欢查看每一行并了解如何以及为什么使用该部分。所以,如果我耽误了这么多时间来给出生命的迹象,请不要生气嘿嘿嘿,我只是在学习和吸收
  • 不用担心。看看this answer 我在另一个问题上发帖,在那里我更深入地介绍了如何记录您的网络流量、查找 API 端点和模仿请求。
【解决方案2】:

在进行大量测试之前,我解决了所有问题(现在呵呵)。 只需添加

    if c % 2 != 0:
        column = 2
        if c % 2 == 0:
            line += 1
    else:
        line += 1
        column = 1

我非常感谢所有帮助过我的人 =)))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2014-05-09
    • 1970-01-01
    • 2017-11-19
    • 1970-01-01
    • 2020-01-09
    • 2021-03-23
    • 2014-11-13
    • 2020-12-31
    相关资源
    最近更新 更多