【问题标题】:python - Regular expression to extract certain text data from a filepython - 从文件中提取某些文本数据的正则表达式
【发布时间】:2019-08-07 18:27:58
【问题描述】:

我有一个从 pdf 转换为文本数据的文本文件。从文本数据中,我想提取存在的描述,后跟字符串“FIGURE”。下面是一些文本数据的示例行,

图 1-1。剂量设计的经验方法 养生法。之后监测预期和不利的影响 给药一种药物的剂量方案并用于进一步 通过反馈细化和优化方案(虚线)。

Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 5/25/19 11:07 2019 年 5 月 25 日下午 11:07

第 1 章 • 治疗相关性 5

看待这两个子学科的另一种方式是 药代动力学涉及身体对药物的作用 (吸收、分布、代谢、排泄),而 药效学描述了药物对身体的作用(两者 期望和不期望的影响)。根据这个定义,可以 错误地认为这些是相反的学科,而在 事实上,它们是齐头并进的。图 1-3 表明 药代动力学处理浓度-时间关系,而 药效学描述药物之间的关系 注意力以及好的(期望的)和坏的(不利的)影响。每个 这两个拼图本身不足以指导治疗 并优化剂量;只有当药代动力学和药效学 是相关联的 (PK/PD) 和整合的,它们是否在治疗上成为 有用。这种集成通常通过开发来实现 捕捉观察到的数学模型(PK/PD 模型) 关系,并允许预测和识别最优 给药方案。

图 1-2。合理的设计方法 剂量方案。药代动力学和药效学 药物首先被定义。然后,对药物的反应,再加上 药代动力学信息,用作反馈(虚线) 修改给药方案以实现最佳治疗。对于一些 药物,在体内形成的活性代谢物也可能需要 考虑在内。

我已将 pdf 文件读入文本并尝试使用一些正则表达式组合对文本数据应用 re.search。但没有运气。

# Get files text content
text = file_data['content']
#print(text)
text1 = re.search('FIGURE[ ]*[0-9]-[0-9]. (.*)',text,re.MULTILINE)

【问题讨论】:

    标签: python


    【解决方案1】:
    text1 = re.findall('FIGURE\s*[0-9]+-[0-9]+. (.*)',text,re.MULTILINE)
    >>> import re
    >>> t="""FIGURE 1-1. An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).
    ...
    ... Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 5/25/19 11:07 PM5/25/19 11:07 PM
    ...
    ... CHAPTER 1 • Therapeutic Relevance 5
    ...
    ... Another way of looking at these two subdisciplines is that pharmacokinetics deals with what the body does to the drug (absorption, distribution, metabolism, excretion), whereas pharmacodynamics describes what the drug does to the body (both desired and undesired effects). From this definition, one could wrongly conclude that these are opposite disci- plines, whereas in reality, they go hand-in-hand. Figure 1-3 shows that pharmacokinetics deals with concentration–time relationships, whereas pharmacodynamics describes the relationship between drug concentration and both good (desired) and bad (adverse) effects. Each of these two puzzle pieces by itself is insufficient to guide therapy and optimize dosing; only when pharmacokinetics and pharmacodynamics are linked (PK/PD) and integrated do they become therapeutically useful. This integration is commonly achieved by developing mathematical models (PK/PD models) that capture the observed relationships and allow prediction and identification of optimum dosing regimens.
    ...
    ... FIGURE 1-2. A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account."""
    >>> re.findall('FIGURE\s*[0-9]-[0-9]. (.*)',t,re.MULTILINE)
    ['An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).', 'A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account.']`
    

    【讨论】:

    • 它给出每个图形的第一行描述的输出。我想要完整的文字描述。
    • complete text description 是什么意思?你能告诉我一个预期的输出吗?
    • 我想要如下输出,一种剂量方案设计的经验方法。在施用药物剂量方案后监测期望和不利的效果,并用于通过反馈(虚线)进一步完善和优化方案。
    • 我不确定你的数据是哪种格式,如果你的数据是在上面解释器中指定的多行中,那么代码将起作用。
    猜你喜欢
    • 2015-01-16
    • 1970-01-01
    • 2011-02-15
    • 2021-12-30
    • 1970-01-01
    • 1970-01-01
    • 2020-01-29
    • 1970-01-01
    • 2017-10-08
    相关资源
    最近更新 更多