【发布时间】:2016-04-19 23:08:42
【问题描述】:
我正在尝试搜索标题中包含特定单词的论文。更准确地说,是 2010 年至 2015 年间发表的论文中的病毒或病毒一词。这是我的代码:
import re
from Bio import Medline
handle = Entrez.esearch(db="pubmed", # database to search
term="2010[Date - Publication]:2015[Date - Publication]"
)
record = Entrez.read(handle)
handle.close()
pmid_list = record["IdList"] #list of records
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)
titles = [] # start with empty list of titles
for record in records:
ti_list = record['TI'] #titles
for title in ti_list:
if title == "virus" and title not in titles: #searching viral/virus
titles.append(title)
print('Publications with viral or virus in the title:')
for record in records:
print(" ", title)
如果我只是简单地打印(记录['TI'],那么我会在我的搜索查询中获得所有标题的列表。但是,我无法搜索特定的单词。我认为我的错误可能在“ if title == "virus"(因为显然没有任何论文会单独以这个词命名)。
我很困。有没有更好的方法在我查询的论文标题中搜索这个词?
谢谢。
编辑:更新代码(仍然没有运气)
import re
from Bio import Medline
handle = Entrez.esearch(db="pubmed", # database to search
term="2010[Date - Publication]:2015[Date - Publication]"
)
record = Entrez.read(handle)
handle.close()
pmid_list = record["IdList"] #list of records
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text")
records = Medline.parse(handle)
r = re.compile(r"\bvir(al|us)\b")
titles = set() # start with empty list of titles
for record in records:
ti_list = record['TI'] # titles
for title in ti_list:
if r.search(title): #
titles.add(title)
print('Publications with viral or virus in the title:')
for record in records:
print(" ", title)
新代码:
import re
from Bio import Medline
handle = Entrez.efetch(db="pubmed", id=pmid_list, rettype="medline", retmode="text",
term="2010[Date - Publication]:2015[Date - Publication]")
records = Medline.parse(handle)
titles = []
for record in records:
ti_list = record['TI']
for title in ti_list:
titles.append(title)
handle.close()
for title in titles:
print(title)
【问题讨论】: