如何过滤字典值（在另一个字典中）答案

【问题标题】：How to filter a dictionary value (within another dictionary)如何过滤字典值（在另一个字典中）
【发布时间】：2015-01-13 11:44:12
【问题描述】：

我会尽我所能解释这一点，所以对于长篇文章提前道歉。

首先，我在这里有一个 API (http://dev.c0l.in:5984/income_statements/_all_docs)，在这本字典中有 5000 个其他字典，我可以通过它们的 ID (http://dev.c0l.in:5984/income_statements/30e901a7b7d8e98328dcd77c369b6ad7) 访问它们

到目前为止，我已经创建了一个程序，可以对这些字典进行排序，并且只打印出（到 csv）与用户输入部门（例如医疗保健）相关的字典

但是，我希望能够实现过滤搜索，以便程序只打印高于或低于用户输入值的语句，例如仅从（用户输入）收盘股票中检索数据，并且仅检索低于 (

我的问题是，我不确定如何。

我了解如何获取用户输入，以及如何访问字典中的字典，但我不知道如何过滤高于或低于用户输入值。

这是我的代码的副本，任何指针将不胜感激！

import urllib #Imports the url - library module (older the urllib2 but has some useful decodes if needed)
import urllib2 #Imports the Url- Library module (Most recently updated + used)
import csv #Imports the commands that allows for csv writing/reading
import json #Imports the ability to read/use Json data
import time #Imports the time module - allows the developer to examine benchmarks (How long did it take to fetch data)
import os


income_csv = csv.writer(open("Income Statement_ext.csv", "wb")) #This creates a CSV file and writes functions to it
financial_csv = csv.writer(open("Statement of financial position_ext.csv", "wb"))

#The two csv 'writers' create the headers for the information within the CSV file before the information from the api is added to it
financial_csv.writerow([
    ('Company name'),
    ('Non Current Assets'),
    ('Current Assets'),
    ('Equity'),
    ('Non-Current Assets'),
    ('Current Liabilities')])

income_csv.writerow([
    ('Company name'),
    ('Sales'),
    ('Opening Stock'),
    ('Purchases'),
    ('Closing Stock'),
    ('Expenses'),
    ('Interest payable'),
    ('Interest receivable')])

income_url = "http://dev.c0l.in:5984/income_statements/_all_docs"
income_request = urllib2.urlopen(income_url).read()
income_response = json.loads(income_request)
#defines the income url

financial_url = "http://dev.c0l.in:5984/financial_positions/_all_docs"
financial_request = urllib2.urlopen(financial_url).read()
financial_response = json.loads(financial_request)
#defines the financial postion url
count = 0
#sets the count for documents printed to 0
def income_statement_fn():
    global count #allows for the count to be kept globally
    print ("(Type help if you would like to see the available choices)")
    income_user_input = raw_input("Which sector would you like to iterate through in Income Statement?: ").lower()# Asks the user which sector within the chosen statement he/she would like to examine
    if income_user_input == "help":
        print ("Available sectors are: ")
        print ("Technology")
        print ("Healthcare")
        print ("Industrial goods")
        print ("Financial")
        print ("Utilities")
        print ("Basic materials")
        print ("Services") 
        income_statement_fn()

    elif income_user_input == "technology" or income_user_input == "healthcare" or income_user_input == "industrial goods" or income_user_input == "financial" or income_user_input == "utilities" or income_user_input == "basic materials" or income_user_input == "services":
        print 'Starting...' # I use this print to set a milestone (if it prints this, everything before it has worked without error)
        start = time.clock()
        start
        for item in income_response['rows']:
            is_url = "http://dev.c0l.in:5984/income_statements/" + item['id'] #This combines the api with the array's ID's allowing us to access every document automatically
            is_request = urllib2.urlopen(is_url).read() #Opens is_url and reads the data
            is_response = json.loads(is_request) #loads the data in json format
            if is_response.get ('sector') == income_user_input: #matches the sector the user inputed - allows us to access that dictionary
                income_csv.writerow([
                 is_response['company']['name'],
                 is_response['company']['sales'],
                 is_response['company']['opening_stock'],
                 is_response['company']['purchases'],
                 is_response['company']['closing_stock'],
                 is_response['company']['expenses'],
                 is_response['company']['interest_payable'],
                 is_response['company']['interest_receivable']]) # The lines of code above write the chosen fields to the csv file
            count +=1
            print ("filtering statements") + ("( "+" %s "+" )") % count
        start
        print start
        restart_fn()
    else:
        print ("Invalid input!")
        income_statement_fn()





def financial_statement_fn(): # Within this function is the code required to fetch information related to the financial position statement
    global count # Allows for the count to be kept globally (outside the function)
    print ("(Type help if you would like to see the available choices)")
    financial_user_input = raw_input("Which sector would you like to iterate through in financial statements?: ").lower()
    if financial_user_input == "help":
        print ("Available sectors are: ")
        print ("Technology")
        print ("Healthcare")
        print ("Industrial goods")
        print ("Financial")
        print ("Utilities")
        print ("Basic materials")
        print ("Services")
        financial_statement_fn()

    elif financial_user_input == "technology" or financial_user_input == "healthcare" or financial_user_input == "industrial goods" or financial_user_input == "financial" or financial_user_input == "utilities" or financial_user_input == "basic materials" or financial_user_input == "services":
        print 'Starting'
        for item in financial_response['rows']:
            fs_url = "http://dev.c0l.in:5984/financial_positions/" + item['id']#This combines the api with the array's ID's allowing us to access every document automatically
            fs_request = urllib2.urlopen(fs_url).read()
            fs_response = json.loads(fs_request)
            if fs_response.get ('sector') == financial_user_input:
                financial_csv.writerow([
                    fs_response['company']['name'],
                    fs_response['company']['non_current_assets'],
                    fs_response['company']['current_assets'],
                    fs_response['company']['equity'],
                    fs_response['company']['non_current_liabilities'],
                    fs_response['company']['current_liabilities']])
                count +=1
                print ("printing statements") + ("( "+" %s "+" )") % count
        print ("---------------------------------------------------------------------")
        print ("finished fetching data")
        print ("---------------------------------------------------------------------")
        restart_fn()

    else:
        print ("Invalid Input!")
        financial_statement_fn()


def launch_fn():
    print ("Please type 'help' if you would like to examine all available options")
    launch_user_input = raw_input("Welcome, Which statement would you like to examine?: ").lower()
    if launch_user_input == "income" or launch_user_input == "income statement":
        income_statement_fn()
    elif launch_user_input == "financial" or launch_user_input == "financial statement":
        financial_statement_fn()
    elif launch_user_input == "help" :
        print ("You can use the following commands on this menu: ")
        print ("---------------------------------------------------------------------")
        print ("Income or Income statement")
        print ("Will allow you to retrieve data relating to financial Income statements")
        print ("---------------------------------------------------------------------")
        print ("Financial or Financial statement")
        print ("Will allow you to retrieve data relating to the statement of financial position")
        print ("---------------------------------------------------------------------")
        launch_fn()
    else:
        print ("If you would like to look at the available options please type help")
        launch_fn()

def restart_fn():
    restart_prompt = raw_input("Would you like to examine another statement?: ").lower()
    if restart_prompt == 'y' or restart_prompt == 'yes':
        launch_fn()
        count = 0
    elif restart_prompt == 'n' or restart_prompt == 'no':
        raise SystemExit("Shutting down....")

def restart_api_down_fn():
    print ("Type 'y' or 'yes' to continue, 'n' or 'no' to exit or 'r' or 'reconnect' to test servers again")
    restart_prompt_api = raw_input("Would you like to continue anyway?: ").lower()
    if restart_prompt_api == 'r' or restart_prompt_api == 'reconnect' or restart_prompt_api == 'test':
        api_status_fn()
        count = 0
    elif restart_prompt_api == 'n' or restart_prompt_api == 'no':
        raise SystemExit("Shutting down....")
    elif restart_prompt_api == 'y' or restart_prompt_api == 'yes':
        print (" Continuing... Programme performance may be severely affected")
        launch_fn()
    else:
        print ("Invalid input...")
        restart_api_down_fn()

def api_status_fn():
    hostname_income = "http://dev.c0l.in:5984/income_statements" 
    response_income = os.system("ping -c 1 " + hostname_income)
    hostname_financial = "http://dev.c0l.in:5984/financial_positions"
    response_financial = os.system("ping -c 1 " + hostname_financial)
    global count
    count = 0

    if response_income == 0:
        print hostname_income, 'is up!'
        count +=1
    else:
        print hostname_income, 'is experiencing connection issues!'        

    if response_financial == 0:
        print hostname_financial, 'is up!'
        count +=1

    else:
        print hostname_financial, 'is experiencing connection issues!'

    if count == 2:
        launch_fn()

    elif count == 0:
        restart_api_down_fn() # Code only for UNIX SYSTEMS?

#def api_status_fn():
 #   hostname = "http://dev.c0l.in:5984/income_statements"
  #  ping = urllib.urlopen(hostname).getcode()
   # if ping == "200":
     #   print 'oh no!'
# add filtering & sorting







api_status_fn()

如果您需要任何其他解释，请告诉我，

干杯！

【问题讨论】：

我没有读过你的代码，但是像filtered_d = {id: subdict for id, subdict in d.iteritems() if d["Closing Stock"] <= 40000} 这样的代码有用吗？

标签： python python-2.7 csv dictionary

【解决方案1】：

我会说你的代码很混乱，如果你试着把它分解一下，你可能会更幸运。我会在这个答案的最后提出一些建议。

从根本上说，您需要过滤获得的特定结果。查看您的代码，我可以看到以下内容：

elif financial_user_input == "technology" or financial_user_input == "healthcare" or financial_user_input == "industrial goods" or financial_user_input == "financial" or financial_user_input == "utilities" or financial_user_input == "basic materials" or financial_user_input == "services":
    print 'Starting'
    for item in financial_response['rows']:
        fs_url = "http://dev.c0l.in:5984/financial_positions/" + item['id']#This combines the api with the array's ID's allowing us to access every document automatically
        fs_request = urllib2.urlopen(fs_url).read()
        fs_response = json.loads(fs_request)
        if fs_response.get ('sector') == financial_user_input:

这段代码混合了以下职责：

验证用户输入
请求记录
过滤记录

如果您将这些职责拆分为单独的方法，那么您会发现您的代码更易于推理。此外，正如我稍后将展示的那样，以这种方式拆分内容允许您重新组合不同的部分以自定义过滤记录的方式等。

如果它被分开一点：

def _get_single_record(id):
    """ Request an individual financial position.
        This does not filter """
    ... read and return the json decoded data ...

def _record_matches_sector(record, sector):
    """ Determine if the record provided matches the sector """
    return record['sector'] == sector

def _record_meets_closing_stock_limit(record, limit):
    """ Determine if the record provided has a
        closing stock of at least limit """
    return record['closing stock'] >= limit

def _get_all_filtered_records(ids, sector, limit):
    """ Return all financial position records that
        match the sector and closing stock limit """
    record_generator = (_get_single_record(id) for id in ids)
    return (
        record for record in record_generator
        if _record_matches_sector(record, sector)
        and _record_meets_closing_stock_limit(record, limit)
    )

这显然只是返回一个生成器，它返回与您的部门和限制相匹配的记录。您可以添加更多测试等等，但是更新代码以测试每个测试仍然是相当手动的。您需要一种将一些可选测试应用到 record_generator 并返回匹配结果的方法。

这在 python 中非常简单，因为 python 将函数视为第一类对象（这意味着您可以将它们分配给变量）并且您可以使用 lambdas 快速创建自定义函数。这意味着您可以将_get_all_filtered_records 重述为：

def _make_limit_test(limit):
    """ This returns a function which accepts records that meet the limit """
    return lambda record: record['closing stock'] >= limit

def _make_sector_test(sector):
    """ This returns a function which accepts records that match the sector """
    return lambda record: record['sector'] == sector

def _filter_records_by_tests(ids, tests):
     """ Returns all the records that pass all the tests """
     record_generator = (_get_single_financial_position_record(id) for id in ids)
     for record in record_generator:
         if all(test(record) for test in tests):
             yield record

然后，您可以通过询问用户来构建要通过的测试列表。这将是一个足以验证此方法是否有效的演示：

def demo_filtering_by_healthcare_and_40k(ids):
    tests = [_make_sector_test("healthcare"), _make_limit_test(40000)]
    return _filter_records_by_tests(ids, tests)

如您所见，我的方法名称很长，而方法很短。这确实是个人风格的问题，但我发现这样做可以使方法的作用一目了然，并且可以让您快速理解代码以验证它是否与名称匹配。

所以总结一下，您正在从远程 api 请求记录。您可以使用列表推导过滤这些。列表推导非常强大，允许您获取源数据并对其进行转换和过滤。阅读它们会对您有很大帮助。

【讨论】：

我不得不承认，这个列表理解看起来非常流畅，你的解释很棒。不幸的是，我还不能让它工作，但我想我只需要练习我的 python，因为我很确定我做的不对。我会告诉你它是如何工作的以这种格式重新编写我的代码 - 干杯！
如果您对此有任何问题，请告诉我。不是很明显的事情是列表推导返回的是列表还是生成器。如果理解被 () 包围，它返回一个生成器，而 [] 返回一个列表。如果您尝试索引某些东西并且它失败了，那么请查看该对象是否实际上是一个生成器。见wiki.python.org/moin/Generators