【发布时间】:2019-09-17 10:51:11
【问题描述】:
我正在尝试从不会为答案添加任何值的文本中提取子字符串。我用 n-gram 尝试过,但没有得到令人满意的结果。
我正在尝试使用谷歌通用句子编码器查找两个文本之间的相似性。我观察到,如果我在将文本传递给编码器之前清理文本,我会得到更好的结果。我想提取从问题中重复的文本,因为它不会为答案增加任何价值。
def extract_answer(question,answer):
<< some code goes here >>
return extracted_text
Question = "Why is the plasma membrane called a selectively permeable membrane?"
Answer = "The cell membrane or the plasma membrane is known as a selectively permeable membrane because it regulates the movement of substances in and out of the cell. This means that the plasma membrane allows the entry of only some substances and prevents the movement of some other materials."
extracted_answer = extract_answer(Question,Answer)
print(extracted_answer)
Sample 1
---------
Input
-------
Question: Why is the plasma membrane called a selectively permeable membrane?
Answer: The cell membrane or the plasma membrane is known as a selectively permeable membrane because it regulates the movement of substances in and out of the cell. This means that the plasma membrane allows the entry of only some substances and prevents the movement of some other materials.
Expected Output
---------------
Output: it regulates the movement of substances in and out of the cell. This means that the plasma membrane allows the entry of only some substances and prevents the movement of some other materials.
Sample 2
----------
Input
-------
Question: Why is the diver able to cross the river?
Answer: The swimmer is able to cross the river because the particles of matter have space between them.
Expected Output
---------------
Output: particles of matter have space between them.
【问题讨论】:
标签: regex python-3.x machine-learning nlp