【机器学习开放项目】电子邮件标注数据集

本数据集提供了一组电子邮件的集合。

The datasets provided below are sets of emails.

目的是确定电子邮件中的哪些部分会涉及到人名。

The goal is to identify which parts of the email refer to a person name.

此任务是信息提取在一般问题领域的一个示例。

This task is an example of the general problem area of Information Extraction.

项目思路：

将任务建模为一个序列标记问题，其中每个电子邮件都是一系列标记，每个标记都可以有一个“人名”或“非人名”标签。

Model the task as a Sequential Labeling problem, where each email is a sequence of tokens, and each token can have either a label of “person-name” or “not-a-person-name”.

电子邮件数据集网址：

http://www.cs.cmu.edu/~einat/datasets.html

小论文：从电子邮件中提取个人姓名：将姓名识别应用于非正式文本

Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text

下载论文地址：

http://page2.dfpan.com/fs/blcaj2921529716f8d4/

更多精彩文章请关注微信号：【机器学习开放项目】电子邮件标注数据集