【发布时间】:2020-01-29 14:12:46
【问题描述】:
假设我有一个包含以下内容的文本文件:(在原始答案后添加的内容)
Quetiapine fumarate Drug substance This document
Povidone Binder USP
This line doesn't contain any medicine name.
This line contains Quetiapine fumarate which shouldn't be extracted as it not present at the
beginning of the line.
Dibasic calcium phosphate dihydrate Diluent USP is not present in the csv
Lactose monohydrate Diluent USNF
Magnesium stearate Lubricant USNF
Lactose monohydrate, CI 77491
0.6
Colourant
E 172
Some lines to break the group.
Silicon dioxide colloidal anhydrous
(0.004
Gliding agent
Ph Eur
Adding some random lines.
Povidone
(0.2
Lubricant
Ph Eur
我有一个 csv,其中包含我想在 .txt 文件中匹配的药物名称列表,并提取 2 个独特药物之间存在的所有数据(当药物名称位于行首时)。( csv 文件中的药物示例为 'Quetiapine fumarate', 'Povidone', 'Magnesium stearate', 'Lactose monohydrate' etc etc.)
我想迭代文本文件的每一行并创建从一种药物到另一种药物的组。
只有当药物名称出现在换行符的开头并且不在行之间时才会发生这种情况。
预期输出:
['Quetiapine fumarate Drug substance This document'],
['Povidone Binder USP'],
['Lactose monohydrate Diluent USNF'],
['Magnesium stearate Lubricant USNF'],
[Lactose monohydrate, CI 77491
0.6
Colourant
E 172],
[Povidone
(0.2
Lubricant
Ph Eur]
有人可以帮我在 Python 中做同样的事情吗?
尝试到现在:
with open('C:/Users/test1.txt', 'r', encoding='utf8') as file:
data = file.read()
medicines = ('Quetiapine fumarate', 'Povidone', 'Magnesium stearate', 'Lactose monohydrate')
result = []
#with open('C:\Users\substancecopy.csv') as f:
for line in data:
if any(line.startswith(med) for med in medicines):
result.append(line.strip())
我需要捕获从一种药物到另一种药物的所有文本,如预期输出中所示,这段代码不会发生这种情况
【问题讨论】:
标签: python regex python-3.x string pattern-matching