【问题标题】:Pandas: Create 1 function to read json then create another function to create dataframePandas:创建 1 个函数来读取 json,然后创建另一个函数来创建数据框
【发布时间】:2021-05-10 10:00:18
【问题描述】:

我想创建一个函数来从 API 获取数据,然后创建另一个函数来创建和清理相应的数据框以供使用。

第一组 def 如下所示,效果很好:

def get_data():

    print('start download the 1st set')
    confirm_details = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 1st set')

    print('start download the 2nd set')
    latest_situ = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Flatest_situation_of_reported_cases_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 2nd set')

    print('start download the final set')
    residential = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fbuilding_list_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the final set')

get_data()

第二个定义如下,但它告诉我一个错误“NameError: name 'confirm_details' is not defined:

def clean_confirm_df():
    confirm_df = pd.read_json(io.StringIO(confirm_details.decode('utf-8')))
    confirm_df.columns = confirm_df.columns.str.replace(" ", "_" )
    confirm_df.columns = confirm_df.columns.str.replace('/', "_")
    confirm_df.columns = confirm_df.columns.str.replace("*", "")
    confirm_df.columns = confirm_df.columns.str.strip()
    confirm_df['Report_date'] = pd.to_datetime(confirm_df['Report_date'], dayfirst=True)
    confirm_df.rename(columns = {'Confirmed_probable': 'Confirmed'}, inplace = True)
    confirm_df = confirm_df.drop(['Name_of_hospital_admitted', 'Date_of_onset'], axis = 1)
    confirm_df['HK_Non-HK_resident'] = confirm_df['HK_Non-HK_resident'].str.upper()
    confirm_df.head()
    
clean_confirm_df()

我查看了第一个定义,我看到定义了“confirm_details”。我尝试了创建各自 df 作品的代码(confirm_df、latest_situ_df 和residential_df)在单独运行时可以正常工作。

我正在自学 python 和 pandas,感谢任何建议我应该如何更改我的代码以使其正常工作。

谢谢。

【问题讨论】:

  • 都是关于作用域的,这些变量是在作用域get_data() 中定义的,而不是全局变量。你注意到函数,但你没有定义函数,没有返回任何东西。我建议从get_data() 返回对JSON 的引用的dict,并作为参数传递给clean_df()。值得多在线学习一下 Python 编程基础知识
  • 感谢@RobRaymond 它有效。是的,我同意你的建议,我应该做更多的在线学习。有时,在我在 youtube 上观看了一些演示之后,我很难把事情弄明白。我不明白您评论“我注意到功能,但我没有定义功能”。我想当我使用 def 时定义了函数。感谢您的帮助,祝您有美好的一天
  • 这有点老派,事实上我使用过许多编程语言。我喜欢函数返回某些东西的定义。子程序只是做一些事情,但不返回任何东西。不过,我确实认为了解这些概念很有用。

标签: python pandas function dataframe


【解决方案1】:

根据评论 - 构建您的代码,以便您了解变量的范围。你假设一切都是全球性的,这将是一件非常糟糕的事情......

def get_data():
    ret = {}

    print('start download the 1st set')
    ret["confirm_details"] = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 1st set')

    print('start download the 2nd set')
    ret["latest_situ"] = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Flatest_situation_of_reported_cases_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the 2nd set')

    print('start download the final set')
    ret["residential"] = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fbuilding_list_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
    print('complete download the final set')

    return ret

def clean_confirm_df(data):
    confirm_df = pd.read_json(io.StringIO(data["confirm_details"].decode('utf-8')))
    confirm_df.columns = confirm_df.columns.str.replace(" ", "_" )
    confirm_df.columns = confirm_df.columns.str.replace('/', "_")
    confirm_df.columns = confirm_df.columns.str.replace("*", "")
    confirm_df.columns = confirm_df.columns.str.strip()
    confirm_df['Report_date'] = pd.to_datetime(confirm_df['Report_date'], dayfirst=True)
    confirm_df.rename(columns = {'Confirmed_probable': 'Confirmed'}, inplace = True)
    confirm_df = confirm_df.drop(['Name_of_hospital_admitted', 'Date_of_onset'], axis = 1)
    confirm_df['HK_Non-HK_resident'] = confirm_df['HK_Non-HK_resident'].str.upper()
    return confirm_df

mydata = get_data()
df = clean_confirm_df(mydata)
print(df.head().to_markdown())
start download the 1st set
complete download the 1st set
start download the 2nd set
complete download the 2nd set
start download the final set
complete download the final set
|    |   Case_no. | Report_date         | Gender   |   Age | Hospitalised_Discharged_Deceased   | HK_Non-HK_resident   | Case_classification   | Confirmed   |
|---:|-----------:|:--------------------|:---------|------:|:-----------------------------------|:---------------------|:----------------------|:------------|
|  0 |          1 | 2020-01-23 00:00:00 | M        |    39 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |
|  1 |          2 | 2020-01-23 00:00:00 | M        |    56 | Discharged                         | HK RESIDENT          | Imported case         | Confirmed   |
|  2 |          3 | 2020-01-24 00:00:00 | F        |    62 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |
|  3 |          4 | 2020-01-24 00:00:00 | F        |    62 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |
|  4 |          5 | 2020-01-24 00:00:00 | M        |    63 | Discharged                         | NON-HK RESIDENT      | Imported case         | Confirmed   |

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-03-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-07-21
    相关资源
    最近更新 更多