【问题标题】:Converting cookie string into Python dict将cookie字符串转换为Python dict
【发布时间】:2023-10-28 22:55:01
【问题描述】:

在 Fiddler 中,我使用从客户端发送的以下 cookie 字符串捕获了一个 HTTPS 请求(在 Inspectors > Raw 中可见):

Cookie: devicePixelRatio=1; ident=exists; __utma=13103r6942.2918; __utmc=13103656942; __utmz=13105942.1.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); mp_3cb27825a6612988r46d00tinct_id%22%3A%201752338%2C%22%24initial_referrer%22%3A%20%22https%3A%2F%2Fwww.pion_created_at%22%3A%20%222015-08-03%22%2C%22platform%22%3A%20%22web%22%2C%%22%3A%20%%22%7D; t_session=BAh7DUkiD3Nlc3NpbWVfZV9uYW1lBjsARkkiH1BhY2lmaWMgVGltZSAoVVMgJiBDYW5hZGEpBjsAVEkiFXNpZ25pbl9wZXJzb25faWQGOwBGaQMSvRpJIhRsYXN0X2xvZ2luX2RhdGUGOwBGVTogQWN0aXZlU3VwcG9ydDo6VGltZVdpdGhab25lWwhJdToJVGltZQ2T3RzAAABA7QY6CXpvbmVJIghVVEMGOwBUSSIfUGFjaWZpZWRfZGFzaGJvYXJkX21lc3NhZ2UGOwBGVA%3D%3D--6ce6ef4bd6bc1a469164b6740e7571c754b31cca

我想在 Python Requests 请求中使用这个 cookie。 (我稍微修改了cookie,以免读者将其用于恶意目的!)。

但是,Requests 似乎使用了dictionary format for sending cookies,我无法将上述字符串/blob 转换为字典格式。

我的问题是:

  • 在 Python 中是否有一种将字符串(如我在 Fiddler 中捕获的 cookie)自动转换为字典的方法?

【问题讨论】:

    标签: python dictionary cookies fiddler


    【解决方案1】:

    您应该可以使用标准 Python 库中提供的SimpleCookie

    from http.cookies import SimpleCookie
    
    rawdata = 'Cookie: devicePixelRatio=1; ident=exists; __utma=13103r6942.2918; __utmc=13103656942; __utmz=13105942.1.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); mp_3cb27825a6612988r46d00tinct_id%22%3A%201752338%2C%22%24initial_referrer%22%3A%20%22https%3A%2F%2Fwww.pion_created_at%22%3A%20%222015-08-03%22%2C%22platform%22%3A%20%22web%22%2C%%22%3A%20%%22%7D; t_session=BAh7DUkiD3Nlc3NpbWVfZV9uYW1lBjsARkkiH1BhY2lmaWMgVGltZSAoVVMgJiBDYW5hZGEpBjsAVEkiFXNpZ25pbl9wZXJzb25faWQGOwBGaQMSvRpJIhRsYXN0X2xvZ2luX2RhdGUGOwBGVTogQWN0aXZlU3VwcG9ydDo6VGltZVdpdGhab25lWwhJdToJVGltZQ2T3RzAAABA7QY6CXpvbmVJIghVVEMGOwBUSSIfUGFjaWZpZWRfZGFzaGJvYXJkX21lc3NhZ2UGOwBGVA%3D%3D--6ce6ef4bd6bc1a469164b6740e7571c754b31cca'
    cookie = SimpleCookie()
    cookie.load(rawdata)
    
    # Even though SimpleCookie is dictionary-like, it internally uses a Morsel object
    # which is incompatible with requests. Manually construct a dictionary instead.
    cookies = {}
    for key, morsel in cookie.items():
        cookies[key] = morsel.value
    

    如果您使用的是 Python 2,则必须从 Cookie 而不是 http.cookies 导入。

    文档:

    https://docs.python.org/2/library/cookie.html

    https://docs.python.org/3/library/http.cookies.html

    【讨论】:

    • 在上面的cookie中,mp_3cb27825a6612988r46d00tinct_id%22%3A%201752338%2C%22%24initial_referrer%22%3A%20%22https%3A%2F%2Fwww.pion_created_at%22%3A%20%222015-08-03%22%2C%22platform%22%3A%20%22web%22%2C%%22%3A%20%%22%7D; 是什么,为什么不遵循key=value的格式?
    • 不确定。我只是复制粘贴了 OP 提供的内容。
    • 当你使用 Cookie 模块时,那部分好像掉线了,让我想知道 cookie 的格式...
    • @zyxue 我的示例 cookie 可能搞砸了,因为出于安全原因,我在发布之前随机删除了其中的字符。
    • @PeterRichter 您还必须删除字符串的“Cookie:”部分。它不是 cookie 的一部分,而是标题的一部分。
    【解决方案2】:

    试试这个功能,肯定会有帮助:)。此功能支持更多的cookie属性,如评论、优先级、版本和Max-Age

    def Robotcookie(cookie: str, parent_domain: str):
        items = cookie.split(';')
        SameSite = HttpOnly = Secure = Domain = Path = Expires = Comment = MaxAge = CookieName = CookieValue = Size = Sessionkey = Version = Priority = None
        CookieName = CookieValue = None
        idx = len(items) - 1
        while idx >= 0:
            item = items[idx].strip()
            idx -= 1
            if not item:
                continue
            SameSiteMatched = re.match(r'^SameSite(.*)?', item, re.I)
            HttpOnlyMatched = SameSiteMatched or re.match(r'^HttpOnly(.*)$', item, re.I)
            SecureMatched = HttpOnlyMatched or re.match(r'^Secure(.*)$', item, re.I)
            DomainMatched = SecureMatched or re.match(r'^Domain(.*)?', item, re.I)
            PathMatched = DomainMatched or re.match(r'^Path(.*)?', item, re.I)
            ExpiresMatched = PathMatched or re.match(r'^Expires(.*)?', item, re.I)
            CommentMatched = ExpiresMatched or re.match(r'^Comment(.*)?', item, re.I)
            MaxAgeMatched = ExpiresMatched or re.match(r'^Max-Age=(.*)?', item, re.I)
            VersionMatched = MaxAgeMatched or re.match(r'^Version=(.*)?', item, re.I)
            PriorityMatched = VersionMatched or re.match(r'^priority=(.*)?', item, re.I)
            matched = SameSiteMatched or HttpOnlyMatched or SecureMatched or DomainMatched or PathMatched or ExpiresMatched or CommentMatched or MaxAgeMatched or VersionMatched or PriorityMatched
            if matched:
                val = matched.groups(0)[0].lstrip('=')
                if matched == SameSiteMatched:
                    SameSite = val if val.lower() in ['strict', 'lax', 'none'] else None
                elif matched == HttpOnlyMatched:
                    HttpOnly = True
                elif matched == SecureMatched:
                    Secure = True
                elif matched == DomainMatched:
                    Domain = val
                elif matched == PathMatched:
                    Path = val
                elif matched == PathMatched:
                    Path = val
                elif matched == ExpiresMatched:
                    Expires = val
                elif matched == CommentMatched:
                    Comment = val
                elif matched == MaxAgeMatched:
                    MaxAge = val
                elif matched == VersionMatched:
                    Version = val
                elif matched == PriorityMatched:
                    Priority = val
            else:
                CookieMatched = re.match(r'^(.[^=]*)=(.*)?', item, re.I)
                if CookieMatched:
                    CookieName, CookieValue = CookieMatched.groups(0)
    
        Sessionkey = True if not Expires else False
        Size = (len(CookieName) if CookieName else 0) + (len(CookieValue) if CookieValue else 0)
    
        Domain = parent_domain if CookieName and not Domain else Domain
        Path = '/' if CookieName and not Path else Path
        Priority = 'Medium' if CookieName and not Priority else Priority.title() if Priority else 'Medium'
    
        Cookie = {
            CookieName: CookieValue,
            'Domain': Domain,
            'Path': Path,
            'Expires': Expires,
            'Comment': Comment,
            'MaxAge': MaxAge,
            'SameSite': SameSite,
            'HttpOnly': HttpOnly,
            'Secure': Secure,
            'Size': Size,
            'Sessionkey': Sessionkey,
            'Version': Version,
            'Priority': Priority
        }
        return Cookie if CookieName else None
        
    
    cookie = 'name=bijaya; comment=Comment1; expires=Mon, 26-Jul-2021 06:34:02 GMT; path=/; domain=.google.com; Secure; HttpOnly; SameSite=none; Max-Age=244114; Version=1.2; priority=high;'
    CookieDict = Robotcookie(cookie, parent_domain='google.com')
    {'name': 'bijaya', 'Domain': '.google.com', 'Path': '/', 'Expires': None, 'Comment': 'Comment1', 'MaxAge': '244114', 'SameSite': 'none', 'HttpOnly': True, 'Secure': True, 'Size': 179, 'Sessionkey': True, 'Version': '1.2', 'Priority': 'High'}
    

    【讨论】:

    • 这个老问题已经包含一个接受的答案。您能否解释(通过编辑您的答案)您的答案与其他答案的不同之处?也知道从长远来看,仅代码的答案是没有用的。
    • @7uc1f3r 谢谢!!。是的,你是对的,但是这个函数还可以找到额外的 cookie 属性,如评论、版本、最大年龄、会话密钥、大小