【问题标题】:Python nested for loop runs only oncePython 嵌套的 for 循环只运行一次
【发布时间】:2020-09-01 13:16:41
【问题描述】:

我遇到了只运行一次的嵌套 for 循环(用于查询中的文档)的问题。它在 news_items 中的 for item 里面,我已经验证了迭代 10 次,查询循环中的 for doc 应该迭代 9 次。当我打印文档时,它会打印 9 个文档,但是当我尝试对文档的内容进行 if / else 检查时,它只会运行一次。 (我希望 9 x 10 输出,因为它正在检查来自父项的项目,以查询文档,但我得到的只是 9 个输出)。 我试图查看堆栈,但我发现似乎没有什么是相关的,从我使用的其他编程语言我不明白为什么这不起作用,但也许我错过了一些东西,因为我对 Python 还很陌生(1 周)。

def scrape(url):
# GET DATE AT THE TIME OF CRAWL START
today = date.today()
d1 = today.strftime("%d/%m/%Y")
# D2 is used for query only
d2 = today.strftime("%Y%m%d")
# LOAD URL IN DRIVER
driver.get(url)
try:
    news_container = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "FlashNews-Box-Root"))
    )
    # array of items
    news_items = news_container.find_elements_by_class_name("FlashNews-Box-Item")

    refresher_ref = db.collection(u'news').document('sources').collection('refresher_news')

    # query for last article
    query = refresher_ref.order_by(u'article_timestamp', direction=firestore.Query.DESCENDING).limit(10).stream()

    for item in news_items:
        print("News items found: " + str(len(news_items)))
        try:
            # image is optional so we need to try it
            try:
                item_image = item.find_element_by_class_name("FlashNews-Box-ItemImage").find_element_by_tag_name(
                    "img").get_attribute("src")
            except Exception as e:
                item_image = "unavailable"

            # time will be added to the same day as when this was ran, since this will run often and compare
            # article texts, we won't have issue with wrong dates
            item_time = item.find_element_by_class_name("FlashNews-Box-ItemTime").text + " " + d1
            item_time_query_temp = item.find_element_by_class_name("FlashNews-Box-ItemTime").text.replace(":", "")
            # normalize timestamp for sorting
            if len(item_time_query_temp) == 3:
                item_time_query_temp = "0" + item_time_query_temp

            item_time_query = d2 + item_time_query_temp
            item_text = item.find_element_by_class_name("FlashNews-Box-ItemText").text
            item_redirect = item.find_element_by_class_name("FlashNews-Box-ItemText").find_element_by_tag_name(
                "a").get_attribute("href")
            result = {"article_time": item_time, "article_url": item_redirect, "article_image": item_image,
                      "article_text": item_text, "article_timestamp": item_time_query}
            # print(result)
            # save data to firestore - check for last item in firestore, then add this article
            is_new = True

            print("Printing 10x")
            # THIS EXECUTES ONLY ONCE?
            for doc in query:
                # print(str(len(query)))
                current_doc = doc.to_dict()
                # print(current_doc)
                # print(current_doc)
                # print("Iteration: " + current_doc['article_text'])
                # print("Old: " + current_doc["article_text"] + " New: " + item_text)
                if current_doc['article_text'] == item_text:
                    print("Match")
                    # print(current_doc['article_text'] + item_text)
                    # print("Old: " + current_doc['article_text'] + " New: " + item_text)
                else:
                    print("Mismatch")
                    # print(current_doc['article_text'] + item_text)
                    # print("Skipping article as the text exists in last 10")
                # else:
                # print("Old: " + current_doc['article_text'] + " New: " + item_text)
                # print(str(is_new))

            # if is_new:
            # refresher_ref.add(result)
            # print("Adding document")

        except Exception as e:
            print(e)

except Exception as e:
    # HANDLE ERRORS
    print(e)

print("Completed running.")
# quit driver at the end of function run
driver.quit()

【问题讨论】:

  • query 是一次性可迭代对象,由第一个 for doc in query 使用。在开始外循环之前尝试query = list(query),以便您可以反复迭代内容。
  • @chepner 你是对的!这完全解决了它。我试图在 firebase 文档上查找生成器类型,但没有找到太多关于它的信息。随意提交它作为答案,我会接受它。

标签: python firebase google-cloud-firestore


【解决方案1】:

query 不是列表,而是其他一些只能使用一次的可迭代类型(类似于generator)。为了在外循环中多次使用它,您需要创建一个列表来将内容保存在内存中。例如,

# query for last article
query = refresher_ref.order_by(u'article_timestamp', direction=firestore.Query.DESCENDING).limit(10).stream()

query = list(query)

for item in news_items:
    ...

【讨论】:

    猜你喜欢
    • 2018-07-24
    • 1970-01-01
    • 2021-06-15
    • 1970-01-01
    • 2021-12-17
    • 2017-09-20
    • 2014-12-07
    • 2023-03-24
    • 1970-01-01
    相关资源
    最近更新 更多