【问题标题】:Scrape data from PubMed从 PubMed 抓取数据
【发布时间】:2018-02-03 02:03:01
【问题描述】:

我编写了以下函数来使用 Entrez 从 PubMed 中提取数据:

def getFromPubMed(id):
    handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=str(id))
    records = Medline.parse(handle)
    for record in records:
        abstract = str(record["AB"])
        mesh = str(record["MH"]).replace("'", "").replace("[", "").replace("]", "")
        pmid = str(record["PMID"])
        title = str(record["TI"]).replace("'", "").replace("[", "").replace("]", "")
        pt = str(record["PT"]).replace("'", "").replace("[", "").replace("]", "")
        au = str(record["AU"])
        dp = str(record["DP"])
        la = str(record["LA"])
        pmc = str(record["PMC"])
        si = str(record["SI"])
        try:
            doi=str(record["AID"])
        except:
            doi = str(record["SO"]).split('doi:',1)[1]
        return pmid, title, abstract, au, mesh, doi, pt, la, pmc

但是,此功能并不总是有效,因为并非所有 MEDLINE 记录都包含所有字段。例如,PMID 不包含任何 MeSH 标题。

我可以用 try-except 语句包装每个项目,例如 abstract:

try:
  abstract = str(record["AB"])
except:
  abstract = ""

但这似乎是一种笨拙的实现方式。有什么更优雅的解决方案?

【问题讨论】:

    标签: python web-scraping pubmed rentrez


    【解决方案1】:

    您可以将提取字段的操作拆分为单独的方法 - 执行以下操作:

    def get_record_attributes(record, attr_details):
        attributes = {}
    
        for attr_name, details in attr_details.items():
            value = ""
            try:
                value = record[details["key"]]
    
                for char in details["chars_to_remove"]:
                    value = value.replace(char, "")
            except KeyError, AttributeError:
                pass
    
            attributes[attr_name] = value
    
        return attributes
    
    def getFromPubMed(id):
        handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=str(id))
        records = Medline.parse(handle)
        for record in records:
            attr_details = {
                "abstract" : {"key" : "AB"},
                "mesh" : { "key" : "MH", "chars_to_remove" : "'[]"},
                #...
                "aid" : {"key" : "AB"},
                "so" : {"key" : "SO"},
            }
    
            attributes = get_record_attributes(record, attr_details)
    
           #...
    

    【讨论】:

    • 这很好用。谢谢。注意它应该是attributes[attr_name] = value 而不是attribute[attr_name] = value
    【解决方案2】:

    怎么样:

    mesh = str(record["MH"] or '')
    

    因为this post 建议的空字典是FALSE

    【讨论】:

      猜你喜欢
      • 2017-03-21
      • 1970-01-01
      • 2020-09-24
      • 2018-06-10
      • 2018-06-06
      • 2021-07-08
      • 2016-10-25
      • 2019-01-28
      • 1970-01-01
      相关资源
      最近更新 更多