【问题标题】：One attribute in JSON in two separate columnsJSON 中的一个属性在两个单独的列中
【发布时间】：2020-01-15 22:53:32
【问题描述】：

我现在正在解决如何在 csv 文件中拆分成两列的问题，看起来像这样：

我想在单独的列中分别显示标准价格和敞篷车价格。但是，它们位于一个名为“aws:offerTermOfferingClass”的属性下。您知道如何在一种类型的实例下使用可转换价格和标准价格的单独列吗？我正在尝试使用这些 ifs，但它会因错误而停止。非常感谢您提前提供的帮助！

import requests
import warnings
import pandas as pd
import numpy as np
warnings.filterwarnings('ignore')



regions=['ap-northeast-1','ap-south-1','ap-southeast-1','ap-southeast-2','eu-central-1','eu-west-1','eu-west-2','us-east-1','us-east-2','us-west-1','us-west-2']
OS=['linux','rhel','windows']

links=[]
for region in regions:
    for system in OS:
        links.append("https://a0.p.awsstatic.com/pricing/1.0/ec2/region/" + region + "/reserved-instance/" + system + "/index.json?")

superdict=[]

for link in links:
    print("Downloading data from: " + link)
    res=requests.get(link,verify=False).json()
    superdict.append(res)


df={"Region":[],"System":[],"Type":[],"Standard":[],"Convertible":[],"On demand":[]}



for res in superdict:
    for item in res['prices']:
        if item['attributes']['aws:offerTermLeaseLength']=="3yr" \
        and item['attributes']['aws:offerTermPurchaseOption']=="No Upfront":
            if item['attributes']['aws:ec2:operatingSystem']=="Linux" \
            and item['attributes']['aws:ec2:instanceType'].endswith('.large'):
                df["Region"].append(item['attributes']['aws:region'])
                df["System"].append("Linux/UNIX")
                df["Type"].append(item['attributes']['aws:ec2:instanceType'])
                df["On demand"].append(item['calculatedPrice']['onDemandRate']['USD'])
                if item['attributes']['aws:offerTermOfferingClass'] =="standard":
                    df["Standard"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
                    df["Convertible"].append(np.NaN)
                elif item['attributes']['aws:offerTermOfferingClass'] =="convertible":
                    df["Convertible"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
                    df["Standard"].append(np.NaN)





            elif item['attributes']['aws:ec2:operatingSystem']=="RHEL":
                df["Region"].append(item['attributes']['aws:region'])
                df["System"].append("Red Hat Enterprise Linux")
                df["Type"].append(item['attributes']['aws:ec2:instanceType'])
                df["On demand"].append(item['calculatedPrice']['onDemandRate']['USD'])
                if item['attributes']['aws:offerTermOfferingClass'] =="standard":
                    df["Standard"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
                    df["Convertible"].append(np.NaN)
                elif item['attributes']['aws:offerTermOfferingClass'] =="convertible":
                    df["Convertible"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
                    df["Standard"].append(np.NaN)

            elif item['attributes']['aws:ec2:operatingSystem']=="Windows":
                df["Region"].append(item['attributes']['aws:region'])
                df["System"].append("Windows")
                df["Type"].append(item['attributes']['aws:ec2:instanceType'])
                df["On demand"].append(item['calculatedPrice']['onDemandRate']['USD'])
                if item['attributes']['aws:offerTermOfferingClass'] =="standard":
                    df["Standard"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
                    df["Convertible"].append(np.NaN)
                elif item['attributes']['aws:offerTermOfferingClass'] =="convertible":
                    df["Convertible"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
                    df["Standard"].append(np.NaN)




data=pd.DataFrame.from_dict(df)
data.to_csv(r'path_to_file.csv',index=False)

这就是我现在拥有的：

而我想要的是：

【问题讨论】：

你能给出你得到的错误吗？
python ValueError: arrays must all be same length 我有类似的东西。

标签： python json csv

【解决方案1】：

您的问题是if 中的这些行：

if item['attributes']['aws:offerTermOfferingClass'] =="standard":
    df["Standard"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
elif item['attributes']['aws:offerTermOfferingClass'] =="convertible":
    df["Convertible"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))

这样，您只需在列表之一中填充一个元素。这意味着经过一次迭代后，您的 dict df 可能如下所示：

{"Region":["EU"],"System":["Windows"], \
  "Type":[3],"Standard":[12.00],"Convertible":[],"On demand":[8]}

两次迭代后：

{"Region":["EU", "JAP"],"System":["Windows", "Linux/UNIX"], \
  "Type": [3,4],"Standard":[12.00],"Convertible":[18.00],"On demand":[8,13]}

那么数据框应该是什么样子的呢？标准和敞篷车只有一个元素？其他人都有两个。你不能像这样建立一个df。这就是错误告诉你的内容：ValueError: arrays must all be same length

所以基本上修复是这样的：

if item['attributes']['aws:offerTermOfferingClass'] =="standard":
    df["Standard"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
    df["Convertible"].append(np.NaN) # or another default value
elif item['attributes']['aws:offerTermOfferingClass'] =="convertible":
    df["Convertible"].append(float(item['calculatedPrice']['effectiveHourlyRate']['USD']))
    df["Standard"].append(np.NaN)

如果某些值为空白，则合并行

您可以在创建数据框后尝试以下操作：

df_ = df.replace('', np.nan).ffill().bfill()
pd.concat([
        df_[df_.duplicated()],
        df.loc[df_.drop_duplicates(keep=False).index]
    ])

参考：Python Pandas - merge rows if some values are blank

分组方式

您也可以使用 Groupby 解决它。

data = data.groupby(["Region","System","Type","On demand"]).sum().replace(0,np.nan)

【讨论】：

感谢您的回复。它更有效，但我希望将这些值放在一种类型的实例下。现在这仍然是拆分，但在 Standard 或 Convertible 列中有空值。
@Lukasz 不知道你在一种实例下的意思。
@Lukasz 数据框应该是什么样子？您期望什么而不是空单元格？您应该提供具有所需输出的示例输入。
现在使用您发送的代码，例如，一行具有实例的特定名称和仅在一列中的价格，假设标准。在下面的行中，我有相同类型实例的名称，但价格值仅在 Convertible 列中。我想要一排而不是两排的价格。
@Lukasz 这就是您的代码正在做的事情。没有示例输入和输出，很难理解你真正想要的是什么