【问题标题】:Python : How to parse things such as : from, to, body, from a raw email source w/Python [duplicate]Python:如何从带有Python的原始电子邮件源解析诸如:从,到,正文之类的东西[重复]
【发布时间】:2013-07-26 04:14:47
【问题描述】:

原始电子邮件通常看起来像这样

From root@a1.local.tld Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
    by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
    for <ooo@a1.local.tld>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
    by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
    Thu, 25 Jul 2013 19:28:59 -0700
From: root@a1.local.tld
Subject: ooooooooooooooooooooooo
To: ooo@a1.local.tld
Cc: 
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"

This is a multi-part message in MIME format.

--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo

--bound1374805739--

所以如果我想编写一个 PYTHON 脚本来获取

From
To
Subject
Body

这是我要构建的代码还是有更好的方法?

a='<title>aaa</title><title>aaa2</title><title>aaa3</title>'

import re
a1 = re.findall(r'<(title)>(.*?)<(/title)>', a)

【问题讨论】:

  • 听说过 PLY 或者,尤其是 PyParsing?如果您要处理大量可能包含会破坏手工解析器的字符的电子邮件,那么这两个是专为解析文件而设计的出色 Python 包。您可能想先尝试 PyParsing;这是最简单的。

标签: python regex python-2.7 mod-wsgi wsgi


【解决方案1】:

我真的不明白你的最终代码 sn-p 与任何事情有什么关系 - 直到那时你还没有提到任何关于 HTML 的事情,所以我不知道你为什么会突然给出一个解析的例子HTML(无论如何你都不应该使用正则表达式)。

无论如何,为了回答您关于从电子邮件中获取标头的原始问题,Python 在标准库中包含了执行此操作的代码:

import email
msg = email.message_from_string(email_string)
msg['from']  # 'root@a1.local.tld'
msg['to']    # 'ooo@a1.local.tld'

【讨论】:

  • 我选择这个答案是因为运动是直接的而不是间接的。 (不导入解析器等)这是一种欢迎。- Sumer Kolcak
  • 如何得到尸体?
【解决方案2】:

幸运的是 Python 让这一切变得更简单:http://docs.python.org/2.7/library/email.parser.html#email.parser.Parser

from email.parser import Parser
parser = Parser()

emailText = """PUT THE RAW TEXT OF YOUR EMAIL HERE"""
email = parser.parsestr(emailText)

print email.get('From')
print email.get('To')
print email.get('Subject')

身体比较复杂。致电email.is_multipart()。如果那是错误的,您可以致电email.get_payload() 获取您的身体。但是,如果它是真的,email.get_payload() 将返回一个消息列表,因此您必须在每个消息上调用 get_payload()

if email.is_multipart():
    for part in email.get_payload():
        print part.get_payload()
else:
    print email.get_payload()

【讨论】:

    【解决方案3】:

    “正文”不在您的示例电子邮件中

    可以使用email模块:

    import email
        msg = email.message_from_string(email_message_as_text)
    

    然后使用:

    print email['To']
    print email['From']
    

    ... ... 等等

    【讨论】:

    • 我一直在尝试构建类似的东西,但在 Python3 中遇到了很多问题——目前的方法是什么?我用这个解决方案返回 None。
    【解决方案4】:

    你应该使用email.parser

    s = """
    From root@a1.local.tld Thu Jul 25 19:28:59 2013
    Received: from a1.local.tld (localhost [127.0.0.1])
        by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
        for <ooo@a1.local.tld>; Thu, 25 Jul 2013 19:28:59 -0700
    Received: (from root@localhost)
        by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
        Thu, 25 Jul 2013 19:28:59 -0700
    From: root@a1.local.tld
    Subject: ooooooooooooooooooooooo
    To: ooo@a1.local.tld
    Cc: 
    X-Originating-IP: 192.168.15.127
    X-Mailer: Webmin 1.420
    Message-Id: <1374805739.3861@a1>
    Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
    MIME-Version: 1.0
    Content-Type: multipart/mixed; boundary="bound1374805739"
    
    This is a multi-part message in MIME format.
    
    --bound1374805739
    Content-Type: text/plain
    Content-Transfer-Encoding: 7bit
    
    ooooooooooooooooooooooooooooooo
    ooooooooooooooooooooooooooooooo
    ooooooooooooooooooooooooooooooo
    
    --bound1374805739--
    """
    
    import email.parser
    
    msg = email.parser.Parser().parsestr(s)
    help(msg)
    

    【讨论】:

      【解决方案5】:

      您可以将原始内容写入文件

      然后像这样读取文件:

      with open('in.txt', 'r') as file:
          raw = file.readlines()
      
      get_list = ['From:','To:','Subject:']
      info_list = []
      
      for i in raw:
          for word in get_list:
              if i.startswith(word):
                  info_list.append(i)
      

      现在info_list 将是:

      ['From: root@a1.local.tld', 'Subject: ooooooooooooooooooooooo', 'To: ooo@a1.local.tld']
      

      我在您的原始内容中没有看到 Body:

      【讨论】:

        猜你喜欢
        • 2013-07-26
        • 1970-01-01
        • 2015-02-26
        • 2013-07-12
        • 1970-01-01
        • 2011-12-13
        • 2020-05-19
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多