Python搜索循环慢答案

【问题标题】：Python search loop slowPython搜索循环慢
【发布时间】：2016-05-16 13:09:27
【问题描述】：

我正在对广告列表（adscrape）进行搜索。每个广告都是 adscrape 中的一个字典（例如下面的广告）。它搜索可能在 200,000 到 1,000,000 个项目之间的 ID 列表 (database_ids)。我想在 adscrape 中查找任何在 database_ids 中没有 ID 的广告。

我当前的代码如下。每个广告扫描 database_ids 需要很长时间和几秒钟。有没有更有效/更快的运行方式（查找大列表中的哪些项目，在另一个大列表中）？

database_ids = ['id1','id2','id3'...]
ad = {'body': u'\xa0SUV', 'loc': u'SA', 'last scan': '06/02/16', 'eng': u'\xa06cyl 2.7L ', 'make': u'Hyundai', 'year': u'2006', 'id': u'OAG-AD-12371713', 'first scan': '06/02/16', 'odo': u'168911', 'active': 'Y', 'adtype': u'Dealer: Used Car', 'model': u'Tucson Auto 4x4 ', 'trans': u'\xa0Automatic', 'price': u'9990'}

for ad in adscrape:
    ad['last scan'] = date
    ad['active'] = 'Y'
    adscrape_ids.append(ad['id'])
    if ad['id'] not in database_ids:
        ad['first scan'] = date
        print 'new ad:',ad
        newads.append(ad)

【问题讨论】：

设置database_ids会快很多
@xfx 从上面的列表中形成set 需要一段时间，而且我认为它已经设置，因为 id 是唯一的......据我所知，检查列表中的项目的最快方法是你这样做的方式if item in list ...'
@AndriyIvaneyko 列表未设置，即使它的项目是唯一的。使用 database_ids = set(database_ids)
@AndriyIvaneyko 您将花费更多时间重复进行列表查找，而不是构建一个开始的集合。
这是硬编码的吗？为什么“制作一套需要一段时间”？如果它是硬编码的（这看起来很荒谬），那么将源代码更改为database_ids = {'id1', 'id2', 'id3', ...}。

标签： python search

【解决方案1】：

您可以将 ids_map 构建为 dict 并通过访问该 ids_map 中的键来检查 id 是否在列表中，如下面的代码 sn-p 所示：

database_ids = ['id1','id2','id3']
ad = {'id': u'OAG-AD-12371713', 'body': u'\xa0SUV', 'loc': u'SA', 'last scan': '06/02/16', 'eng': u'\xa06cyl 2.7L ', 'make': u'Hyundai', 'year': u'2006', 'first scan': '06/02/16', 'odo': u'168911', 'active': 'Y', 'adtype': u'Dealer: Used Car', 'model': u'Tucson Auto 4x4 ', 'trans': u'\xa0Automatic', 'price': u'9990'}

#build ids map
ids_map = dict((k, v) for v, k in enumerate(database_ids))

for ad in adscrape:
    # some logic before checking whether id in database_ids
    try:
        ids_map[ad['id']]
    except KeyError:
        pass
    else:
        #error not thrown perform logic for existed ids
        print 'id %s in list' % ad['id']

【讨论】：

【解决方案2】：

`您可以使用列表推导作为下面给出的代码库。使用上面给出的现有 database_ids 列表和 adscrape dict。

代码库： new_adds_ids = [如果 ad['id'] 不在 database_ids 中，adscrape 中的广告广告]`

【讨论】：

请使用代码示例按钮格式化您的答案中的任何代码。