【问题标题】:import nested dict containing a list into a csv将包含列表的嵌套字典导入 csv
【发布时间】:2020-06-09 07:38:42
【问题描述】:

我正在尝试将以下数据导入 CSV:

{'test.foo.com': {'domain': 'foo.com','FQDN': 'test.foo.com', 'AS': 'AS1111', 'ressource_type': 'A', \
'nb_ip': '1', 'IP': '1.1.1.1', 'service': ['UNKNOWN'], 'port': '[443, 8443]'}}

这段代码我几乎成功了:

#!/bin/python3

## Import ##
# Offical
import csv

### Main ###
if __name__ == '__main__':
  ## Variables
  csv_headers = ['domain', 'FQDN', 'AS', 'ressource_type', 'nb_ip', 'IP', 'service', 'port']
  final_data = {'test.foo.com': {'domain': 'foo.com','FQDN': 'test.foo.com', 'AS': 'AS1111', 'ressource_type': 'A', \
  'nb_ip': '1', 'IP': '1.1.1.1', 'service': ['UNKNOWN'], 'port': '[443, 8443]'}}

  # Open the csv file in "write mode"
  with open(file_name, mode='w') as file:
      # Prepare the writer to add a dict into the csv file
      csv_writer = csv.DictWriter(file, fieldnames=headers)

      # Write the columns header into the csv file
      csv_writer.writeheader()
      # Write the dict into the file
      for key, val in nest_dict.items():
          row = {'FQDN': key}
          row.update(val)
          csv_writer.writerow(row)

结果是:

domain,FQDN,AS,ressource_type,nb_ip,IP,service,port
foo.com,test.foo.com,AS1111,A,1,1.1.1.1,['UNKNOWN'],"[443, 8443]"

但我想:

domain,FQDN,AS,ressource_type,nb_ip,IP,service,port
foo.com,test.foo.com,AS1111,A,1,1.1.1.1,'UNKNOWN','443'
foo.com,test.foo.com,AS1111,A,1,1.1.1.1,'UNKNOWN','8443'

看到区别了吗? 我有一个“服务”列表(这里不需要处理)和一个“端口”列表。 如果“端口”列中有超过 1 个端口,我需要为列表中的每个端口打印一个新行。

我很难做到这一点,因为我没有完全理解这段代码:

# Write the dict into the file
      for key, val in nest_dict.items():
          row = {'FQDN': key}
          row.update(val)
          csv_writer.writerow(row)

你能帮我解决这个问题吗?

【问题讨论】:

    标签: python csv dictionary import


    【解决方案1】:

    这将使用给定的数据给出所需的结果:

    ### Main ###
    if __name__ == '__main__':
      ## Variables
      csv_headers = ['domain', 'FQDN', 'AS', 'ressource_type', 'nb_ip', 'IP', 'service', 'port']
      final_data = {'test.foo.com': {'domain': 'foo.com','FQDN': 'test.foo.com', 'AS': 'AS1111', 'ressource_type': 'A', \
      'nb_ip': '1', 'IP': '1.1.1.1', 'service': ['UNKNOWN'], 'port': '[443, 8443]'}}
    
      # Open the csv file in "write mode"
      with open('out.csv', mode='w') as file:
          # Prepare the writer to add a dict into the csv file
          csv_writer = csv.DictWriter(file, fieldnames=csv_headers)
    
          # Write the columns header into the csv file
          csv_writer.writeheader()
          # Write the dict into the file
          for key, val in final_data.items():
              row = {'FQDN': key}
              # Assume that service is always a list of one value and replace it with the one value
              # it contains.
              val['service'] = val.pop('service')[0]
              row.update(val)
              # Since the value of port is quoted it will be a string, but we wat a list. Remove the
              # value of 'port' from the dict and put it in 'port_string' (= '[443, 8443'')
              port_string = val.pop('port')
              # Remove the opening and closing brackets from the port_string (= '443, 8443').
              port_string = port_string.replace('[', '')
              port_string = port_string.replace(']', '')
              # Now we can split the string into a python list (= ['443', ' 8443'])
              port_list = port_string.split(',')
              # Write a csv row for each value in the port list
              for port in port_list:
                  row['port'] = port.strip()
                  csv_writer.writerow(row)
    

    (顺便说一句,原始帖子中的代码没有运行。此代码包括使其运行的编辑。)

    请注意,由于 'port' 的值被引用(与 'service' 的值不同),它将作为字符串读入,因此必须首先将其转换为列表。如果删除 [443, 8443] 周围的单引号,则代码的端口部分将简化为:

              port_list = val.pop('port')
              # Write a csv row for each value in the port list
              for port in port_list:
                  row['port'] = port
                  csv_writer.writerow(row)
    

    另一个潜在问题是“服务”。它是一个列表,所以它可以有多个值吗?如果是这样,则需要修改代码以解决该问题。

    最后,我在这里展示的代码可能更符合 Python 风格,但希望确保它对初学者来说尽可能易读。一旦它完全按照需要工作,它就可以变得更加 Pythonic。

    【讨论】:

    • 该死的,谢谢你的详细回答!你为我节省了很多研究时间。我会试一试,理解它,然后再适合我。我还将尝试删除“端口”上的引号,以便将其视为列表(无论如何它应该返回一个列表)。是的,“服务”可以有多个值,但我只需要将其转换为字符串(我不需要为每个服务设置一行)。再次感谢! PS:抱歉,如果代码不起作用,我对其进行了一些修改(可读性和混淆了一些值)
    【解决方案2】:

    所以这是最终的代码。

    不确定这是否是最好的方法,但它看起来对我来说已经足够好了。

    #!/bin/python3
    
    ## Import ##
    # Offical
    import csv
    
    ### Main ###
    if __name__ == '__main__':
      ## Variables
      csv_headers = ['domain', 'FQDN', 'AS', 'ressource_type', 'nb_ip', 'IP', 'service', 'port']
      final_data = {'test.foo.com': {'domain': 'foo.com','FQDN': 'test.foo.com', 'AS': 'AS1111', 'ressource_type': 'A', \
      'nb_ip': '1', 'IP': '1.1.1.1', 'service': ['UNKNOWN'], 'port': '[443, 8443]'}}
    
    # Open the csv file in "write mode"
    with open(file_name, mode='w') as file:
        # Prepare the writer to add a dict into the csv file
        csv_writer = csv.DictWriter(file, fieldnames=csv_headers)
    
        # Write the columns header into the csv file
        csv_writer.writeheader()
    
        for key, val in final_data.items():
            # ?
            row = {'FQDN': key}
            # Update the row with all columns values
            row.update(val)
            # If service contains multiple elements it will transform the list into a string with each string separate by a space
            # If service contains just one element, it will transform the list into a string (no space) added
            row['service'] = ' '.join(val['service'])
    
            # Write a row for each value in the port list
            for port in val['port']:
                row['port'] = port
                csv_writer.writerow(row)
    

    结果输出:

    domain,FQDN,AS,ressource_type,nb_ip,IP,service,port
    foo.com,test.foo.com,AS1111,A,1,1.1.1.1,'UNKNOWN','443'
    foo.com,test.foo.com,AS1111,A,1,1.1.1.1,'UNKNOWN','8443'
    

    不要投票给我的答案,我把它用于知识目的。 所有奖励都应该去@bartonstanley

    【讨论】:

      猜你喜欢
      • 2015-07-21
      • 1970-01-01
      • 2021-08-15
      • 2021-07-23
      • 2021-08-10
      • 2016-12-12
      • 2018-10-27
      • 2018-04-15
      • 1970-01-01
      相关资源
      最近更新 更多