【问题标题】:Can you search more than one database at a time using biopython您可以使用 biopython 一次搜索多个数据库吗
【发布时间】:2016-05-09 21:21:53
【问题描述】:

我的任务是使用 NCBI 的 E-Utilties 检索过去 10 年每年提交的有关 Crispr/Cas9 系统的论文数量。我将如何一次搜索多个数据库?到目前为止我的代码:

from Bio import Entrez


Entrez.email = "example@gmail.com"
handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2016/01/01, maxdate=2016/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2016 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2015/01/01, maxdate=2015/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2015 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2014/01/01, maxdate=2014/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2014 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2013/01/01, maxdate=2013/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2013 is: ", record["Count"]


handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2012/01/01, maxdate=2012/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2012 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2011/01/01, maxdate=2011/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2011 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2010/01/01, maxdate=2010/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2010 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2009/01/01, maxdate=2009/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2009 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2008/01/01, maxdate=2008/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2008 is: ", record["Count"]

handle = Entrez.esearch(db ="pubmed", term="Crispr/Cas9 system", mindate=2007/01/01, maxdate=2007/01/01, datetype="pdat")
record = Entrez.read(handle)
record["Count"]
print "Number of papers in 2007 is: ", record["Count"]

【问题讨论】:

  • 您对哪些数据库感兴趣?
  • @Ashafix 好吧,问题是在 NCBI 使用 E-Utilties 来检索提交的有关 Crispr/Cas9 系统的论文数量,所以我猜我可以全部?
  • @Azaro 在这种情况下,您只需要一个数据库:pubmed。我怀疑您想对 pubmed 进行多个并发请求。
  • 还是大搜索一下,然后解析结果,构建一个按年份汇总的结果?

标签: python database search biopython ncbi


【解决方案1】:

您可能已经认识到,您的代码高度冗余,这是 for 循环的典型情况:

from Bio import Entrez

years = range(2016, 2006, -1)  # Creates a list from 2016 to 2007

Entrez.email = "Example@mail.org"

for year in years:  # Go through the list 'years' and assign the value to the variable 'year'
    handle = Entrez.esearch(db ="pubmed", term="Crispr Cas9",
                            mindate=year, maxdate=year)
    record = Entrez.read(handle)
    print "Number of papers in %d is %s" %(year, record["Count"])  # 'Old' string formatting

所有提及 CrispR/Cas9 系统的论文也不太可能使用确切的短语“Cripr/Cas9”并包含“系统”一词。使用搜索词“Crispr Cas9”可以获得更多结果。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-08-06
    • 2017-04-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-06-16
    • 1970-01-01
    • 2021-02-15
    相关资源
    最近更新 更多