【发布时间】:2019-03-21 10:32:00
【问题描述】:
最近被设置为hackerrank,我无法在不破坏Python 3中的文本的情况下从标签中正确清理文本块。
提供了两个示例输入(如下),挑战是清除它们以使其成为安全的普通文本块。完成挑战的时间已经结束,但我很困惑我怎么把这么简单的事情弄错了。任何关于我应该如何去做的帮助将不胜感激。
测试输入一
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. <script>
var y=window.prompt("Hello")
window.alert(y)
</script>Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage.
测试输入二
In-text references or citations are used to acknowledge the work or ideas of others. They are placed next to the text that you have paraphrased or quoted, enabling the reader to differentiate between your writing and other people’s work. The full details of your in-text references, <script language="JavaScript">
document.write("Page. Last update:" + document.lastModified); </script>When quoting directly from the source include the page number if available and place quotation marks around the quote, e.g.
The World Health Organisation defines driver distraction ‘as when some kind of triggering event external to the driver results in the driver shifting attention away from the driving task’.
测试建议的输出 1
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage.
测试建议的输出 2
In-text references or citations are used to acknowledge the work or ideas of others. They are placed next to the text that you have paraphrased or quoted, enabling the reader to differentiate between your writing and other people’s work. The full details of your in-text references, When quoting directly from the source include the page number if available and place quotation marks around the quote, e.g. The World Health Organisation defines driver distraction ‘as when some kind of triggering event external to the driver results in the driver shifting attention away from the driving task’.
提前致谢!
编辑(使用@YakovDan 的清理): 代码:
def sanitize(inp_str):
ignore_flag =False
close_tag_count = 0
out_str =""
for c in inp_str:
if not ignore_flag:
if c == '<':
close_tag_count=2
ignore_flag=True
else:
out_str+=c
else:
if c == '>':
close_tag_count-=1
if close_tag_count == 0:
ignore_flag=False
return out_str
inp=input()
print(sanitize(inp))
输入:
It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English. Many desktop publishing packages and web page editors now use Lorem Ipsum as their default model text, and a search for 'lorem ipsum' will uncover many web sites still in their infancy. <script>
var y=window.prompt("Hello")
window.alert(y)
</script>Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old. Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage.
输出:
读者在查看页面布局时会被页面的可读内容分散注意力,这是一个早已确立的事实。使用 Lorem Ipsum 的关键在于它具有或多或少的正态分布字母,而不是使用“这里的内容,这里的内容”,使它看起来像可读的英语。许多桌面发布包和网页编辑器现在使用 Lorem Ipsum 作为其默认模型文本,搜索“lorem ipsum”将发现许多仍处于起步阶段的网站。
输出应该是什么:
读者在查看页面布局时会被页面的可读内容分散注意力,这是一个早已确立的事实。使用 Lorem Ipsum 的关键在于它具有或多或少的正态分布字母,而不是使用“这里的内容,这里的内容”,使它看起来像可读的英语。许多桌面出版程序包和网页编辑器现在使用 Lorem Ipsum 作为他们的默认模型文本,搜索“lorem ipsum”将发现许多仍处于起步阶段的网站。与普遍的看法相反,Lorem Ipsum 不仅仅是随机文本。它起源于公元前 45 年的一部古典拉丁文学作品,距今已有 2000 多年的历史。弗吉尼亚州汉普登-悉尼学院的拉丁语教授理查德·麦克林托克从 Lorem Ipsum 的一篇文章中查找了一个比较晦涩的拉丁词 consectetur。
【问题讨论】:
-
请说明要做什么。你能提供一个示例输出吗?你能解释一下你已经尝试过什么吗?如果我理解正确,你有一些文本混合了 标签,你需要清除标签吗?
-
对我来说效果很好。能提供一个测试用例吗?
-
@YakovDan 再次感谢您的回复!我已经用代码、输入、输出和我认为输出应该是什么来编辑主帖子。问题是,在清除 标记后,它似乎删除了它后面的其余文本,这完全没问题,没有恶意。
-
我无法复制该问题。相同的代码在我这边运行良好。你能添加你用来调用函数的代码吗?
-
@YakovDan 感谢您回复我。你可以在这里看到我是如何运行它的,如果你粘贴来自主帖子的输入,你应该会收到我得到的输出 - repl.it/repls/FormalStiffPipelining
标签: python python-3.x sanitization input-sanitization