【发布时间】:2021-08-08 15:46:51
【问题描述】:
我是机器学习的初学者,并为我的 nlp 项目使用数据库进行探索。这里我从http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html 得到数据。我正在尝试创建一个 pd 数据框,我想在其中解析 xml 数据,我还想在正面评论中添加一个标签(1),有人可以帮我写代码吗,已经给出了示例输出,
from bs4 import BeautifulSoup
positive_reviews = BeautifulSoup(open('/content/drive/MyDrive/sorted_data_acl/electronics/positive.review', encoding='utf-8').read())
positive_reviews = positive_reviews.findAll('review_text')
positive_reviews[0]
<review_text>
I purchased this unit due to frequent blackouts in my area and 2 power supplies going bad. It will run my cable modem, router, PC, and LCD monitor for 5 minutes. This is more than enough time to save work and shut down. Equally important, I know that my electronics are receiving clean power.
I feel that this investment is minor compared to the loss of valuable data or the failure of equipment due to a power spike or an irregular power supply.
As always, Amazon had it to me in <2 business days
</review_text>
【问题讨论】:
标签: python pandas beautifulsoup nlp