Python通过过滤器在数组中查找值答案

【问题标题】：Python find value in array by filterPython通过过滤器在数组中查找值
【发布时间】：2018-09-27 07:27:32
【问题描述】：

我正在使用 Python 脚本来废弃 google，这是我在脚本完成后得到的。想象一下，如果我有 100 个结果（例如，我展示了 2 个）。

{'query_num_results_total': 'Око 64 резултата (0,54 секунде/и)\xa0', 'query_num_results_page': 77, 'query_page_number': 1, 'query': 'example', 'serp_rank': 1, 'serp_type': 'results', 'serp_url': 'example2.com', 'serp_rating': None, 'serp_title': '', 'serp_domain': 'example2.com', 'serp_visible_link': 'example2.com', 'serp_snippet': '', 'serp_sitelinks': None, 'screenshot': ''}
{'query_num_results_total': 'Око 64 резултата (0,54 секунде/и)\xa0', 'query_num_results_page': 77, 'query_page_number': 1, 'query': 'example', 'serp_rank': 2, 'serp_type': 'results', 'serp_url': 'example.com', 'serp_rating': None, 'serp_title': 'example', 'serp_domain': 'example.com', 'serp_visible_link': 'example.com', 'serp_snippet': '', 'serp_sitelinks': None, 'screenshot': ''}

这是脚本使用代码

import serpscrap
import pprint
import sys

config = serpscrap.Config()
config_new = {
   'cachedir': '/tmp/.serpscrap/',
   'clean_cache_after': 24,
   'sel_browser': 'chrome',
   'chrome_headless': True,
   'database_name': '/tmp/serpscrap',
   'do_caching': True,
   'num_pages_for_keyword': 2,
   'scrape_urls': False,
   'search_engines': ['google'],
   'google_search_url': 'https://www.google.com/search?num=100',
   'executable_path': '/usr/local/bin/chromedriver',
    'headers': {
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language': 'de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4',
      'Accept-Encoding': 'gzip, deflate, sdch',
      'Connection': 'keep-alive',
   },
}

arr = sys.argv

keywords = ['example']

config.apply(config_new)
scrap = serpscrap.SerpScrap()
scrap.init(config=config.get(), keywords=keywords)
results = scrap.run()


for result in results:
    print(result)

如果结果中是我想要的某个 url，我想停止脚本，例如“example.com”

如果我在这里有 https 'serp_url': 'https://example2.com' 我想检查它并在我给出没有 https 的参数时停止脚本，只需 example2.com。如果在脚本工作时无法检查，我需要解释如何通过我提供的参数找到serp_url。

我不熟悉 Python，但我正在构建将运行此 Python 脚本并输出结果的 PHP 应用程序。但我不想使用 PHP 中的结果（通过 serp_url 等进行提取），我希望一切都在 Python 中完成。

【问题讨论】：

标签： python

【解决方案1】：

你可以这样：

for result in results:
    if my_url in result['serp_url']:
    # this match 'myexample.com' in 'http://example.com'
    # or even more like 'http://example.com/whatever' and of course begining with 'https'
        exit

使用 any 是另一种解决方案：

 if any((my_url in result['serp_url'] for result in results)):
     exit

【讨论】：

【解决方案2】：

首先您需要访问serp_url 的值。

由于result 变量是一个字典，输入result['serp_url'] 将返回每个结果的url。

在您打印结果的 for 循环内，您应该添加一个 if 语句，其中 result['serp_url'] 将与包含您想要的 url 的变量进行比较（我认为您没有在代码中提供该信息）。也许它可能是这样的：

for result in results:
    print(result)
    if my_url == result['serp_url']:
        exit

在 https 的情况下也有同样的想法，但现在我们需要 startswith() 方法：

for result in results:
    print(result)
    if my_url == result['serp_url']:
        exit
    if result['serp_url'].startswith('https'):
        exit

希望对你有帮助。

【讨论】：

非常感谢，会很有用的！但是，我需要我的论点不完全匹配（==），但我的 serp_url 应该包含我的论点。如果 serp_url 是带有 https:// 的 example.com，并且我的参数是 example.com，则该语句应该找到匹配项。能做到吗？
我不明白您想要的网址不止一个。在这种情况下，Tzomas 的回答可以解决问题。