【问题标题】:Creating multiple nested dictionaries from .txt file从 .txt 文件创建多个嵌套字典
【发布时间】:2016-11-10 03:28:29
【问题描述】:

我正在尝试创建一个包含多个字典的字典。我是从 .txt 文件创建的:

chrY 6 8 +
chrY 3 5 +
chrX 10 11 +
chrX 13 15 -

我想要的输出是:

{'chrY': {'+' : {'start': [3 , 6], 'end': [5, 8]}}, 'chrX': {'+' : {'start': [10], 'end': [11]} , '-': {'start' : [13], 'end' : [15]}}}

到目前为止,我的代码包括:

import sys
first_dict = {}
intron_dict = {}
def main():
    with open(sys.argv[1], 'r') as intron:
        for line in intron.readlines():
            line = line.split()
            chromosome = line[0]
            start = line[1]
            end = line[2]
            strand = line[3]
            first_dict = {chromosome : (strand, start, end)}

            for k, v in first_dict.iteritems():
                intron_dict.setdefault(k, []).append(v)
        print (intron_dict)
if __name__=='__main__':
    main()

此代码允许我对 chrY 和 chrX 键进行排序,而不会覆盖值。我在合并“+”和“-”键并将数据转换为我想要的格式时遇到问题。到目前为止,我的输出看起来像:

{'chrY': [('+', '6', '8'), ('+', '3', '5')], 'chrX': [('+', '10', '11'), ('-', '13', '15')]}

【问题讨论】:

    标签: python dictionary nested


    【解决方案1】:

    您可以通过使用嵌套的defaultdict 来大大简化您的代码,其中第三级的值是列表:

    from collections import defaultdict
    
    result = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))
    
    with open('test.txt') as f:
        for row in f:
            ch, start, end, op = row.split()
            result[ch][op]['start'].append(start)
            result[ch][op]['end'].append(end)
    
    import json
    print(json.dumps(result, indent=4))
    

    输出:

    {
        "chrY": {
            "+": {
                "start": [
                    "6", 
                    "3"
                ], 
                "end": [
                    "8", 
                    "5"
                ]
            }
        }, 
        "chrX": {
            "+": {
                "start": [
                    "10"
                ], 
                "end": [
                    "11"
                ]
            }, 
            "-": {
                "start": [
                    "13"
                ], 
                "end": [
                    "15"
                ]
            }
        }
    }
    

    【讨论】:

      【解决方案2】:

      一种方法是使用defaultdict。例如:

      import sys
      from pprint import  pprint
      from collections import defaultdict
      
      first_dict = defaultdict(dict)
      intron_dict = {}
      
      d = dict()
      
      
      def main():
          with open('test.csv', 'r') as intron:
              for line in intron.readlines():
                  chromosome, start, end, strand, = line.split()
      
                  if strand not in first_dict[chromosome]:
                      first_dict[chromosome][strand] = defaultdict(list)
      
                  first_dict[chromosome][strand]['start'].append(start)
                  first_dict[chromosome][strand]['end'].append(end)
      
          pprint(first_dict)
      
      if __name__=='__main__':
          main()
      

      结果:

      defaultdict(<class 'dict'>,
                  {'chrX': {'+': defaultdict(<class 'list'>,
                                             {'end': ['11'],
                                              'start': ['10']}),
                            '-': defaultdict(<class 'list'>,
                                             {'end': ['15'],
                                              'start': ['13']})},
                   'chrY': {'+': defaultdict(<class 'list'>,
                                             {'end': ['8', '5'],
                                              'start': ['6', '3']})}})
      

      【讨论】:

        【解决方案3】:

        这是另一种没有defaultdict 的方法。只需使用if ... else

        import sys
        intron_dict = dict()
        def main():
            with open(sys.argv[1], 'r') as intron:
                for line in intron.readlines():
                    line = line.split()
                    chromosome = line[0]
                    start = int(line[1]) # converted to int to avoid quotes in result
                    end = int(line[2])
                    strand = line[3]
                    first_dict = {strand : {'start' : [start], 'end' : [end]}}
        
                    if intron_dict.has_key(chromosome):
                        if intron_dict[chromosome].has_key(strand):
                            intron_dict[chromosome][strand]['start'].append(start)
                            intron_dict[chromosome][strand]['end'].append(end)
                        else:
                            intron_dict[chromosome][strand] = first_dict[strand]
                    else:
                        intron_dict.setdefault(chromosome, first_dict)
        
                print (intron_dict)
        
        if __name__=='__main__':
            main()
        

        输出:

        {'chrY': {'+': {'start': [6, 3], 'end': [8, 5]}}, 'chrX': {'+': {'start': [10], 'end': [11]}, '-': {'start': [13], 'end': [15]}}}
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2017-07-25
          • 1970-01-01
          • 2021-07-21
          • 2020-12-15
          • 1970-01-01
          • 2018-03-12
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多