【发布时间】:2014-01-04 03:04:21
【问题描述】:
我目前正在编写一个程序,该程序将帮助用户确定在 tumblr 上发帖的最佳时间。与 Twitter 一样,大多数追随者拥有如此多的订阅,以至于他们无法跟上,这意味着最好知道自己的特定追随者何时(大部分)在线。在 tumblr 上,这可以通过两种方式来确定 - 首先,他们最近是否分享了最近发布的任何内容,其次,他们最近是否已将其添加到喜欢的帖子列表中。
令人沮丧的是,即使设置为“公开”,任意用户(除了自己)的点赞帖子流也仅对登录实体可用。据我所知,这意味着我必须每隔一段时间向应用程序上传一个登录 cookie,或者让这个 post-request 正常工作。
我已经通过 Opera 的检查器查看了许多成功的出站请求,但我肯定还是遗漏了一些东西,或者可能 requests 正在做一些服务器拒绝的事情,无论我做什么。
问题的本质如下。这目前是用 Python 2.7 编写的 并使用Python requests 和BeautifulSoup。要自己运行它,请将 get_login_response() 顶部的 e 和 p 对更新为一组真实值。
import requests
from bs4 import BeautifulSoup
class Login:
def __init__(self):
self.session = requests.session()
def get_hidden_fields(self):
""" -> string. tumblr dynamically generates a key for its login forms
This should extract that key from the form so that the POST-data to
login will be accepted.
"""
pageRequest = requests.Request("GET","https://www.tumblr.com/login")
received = self.session.send( pageRequest.prepare() )
html = BeautifulSoup(received.content)
hiddenFieldDict = {}
hiddenFields = html.find_all("input",type="hidden")
for x in hiddenFields: hiddenFieldDict[x["name"]]=x["value"]
return hiddenFieldDict
def get_login_response(self):
e = u"dead@live.com"
p = u"password"
endpoint = u"https://tumblr.com/login"
payload = { u"user[email]": e,
u"user[password]": p,
u"user[age]":u"",
u"tumblelog[name]": u"",
u"host": u"www.tumblr.com",
u"Connection:":u"keep-alive",
u"Context":u"login",
u"recaptcha_response_field":u""
}
payload.update( self.get_hidden_fields() )
## headers = {"Content-Type":"multipart/form-data"}
headers = {u"Content-Type":u"application/x-www-form-urlencoded",
u"Connection:":u"keep-alive",
u"Origin":u"https://tumblr.com",
u"Referer": u"https://www.tumblr.com/login",
u"User-Agent":u"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 OPR/18.0.1284.68",
u"Accept":u"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
u"Accept-Encoding":u"gzip,deflate,sdch",
u"Accept-Language":u"en-US,en;q=0.8",
u"Cache-Control":u"max-age=0"
#"Content-Length":VALUE is still needed
}
# this cookie is stale but it seems we these for free anyways,
# so I'm not sure whether it's actually needed. It's mostly
# google analytics info.
sendCookie = {"tmgioct":"52c720e28536530580783210",
"__qca":"P0-1402443420-1388781796773",
"pfs":"POIPdNt2p1qmlMGRbZH5JXo5k",
"last_toast":"1388783309",
"capture":"GDTLiEN5hEbMxPzys1ye1Gf4MVM",
"logged_in":"0",
"_ga":"GA1.2.2064992906.1388781797",
"devicePixelRatio":"1",
"documentWidth":"1280",
"anon_id":"VNHOJWQXGTQXHNCFKYJQUMUIVQBRISPR",
"__utma":"189990958.2064992906.1388781797.1388781797.1388781797.1",
"__utmb":"189990958.28.10.1388781797",
"__utmc":"189990958",
"__utmz":"189990958.1388781797.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)"}
loginRequest = requests.Request("POST",
endpoint,
headers,
data=payload,
cookies=sendCookie # needed?
## ,auth=(e,p) # may not be needed
)
contentLength = len(loginRequest.prepare().body)
loginRequest.data.update({u"Content-Length":unicode(contentLength)})
return self.session.send( loginRequest.prepare() )
l = Login()
res = l.get_login_response()
print "All cookies: ({})".format(len(l.session.cookies))
print l.session.cookies # has a single generic cookie from the initial GET query
print "Celebrate if non-empty:"
print res.cookies # this should theoretically contain the login cookie
我这边的输出:
All cookies: (1)
<<class 'requests.cookies.RequestsCookieJar'>[<Cookie tmgioct=52c773ed65cfa30622446430 for www.tumblr.com/>]>
Celebrate if non-empty:
<<class 'requests.cookies.RequestsCookieJar'>[]>
如果我的代码不安全,并且您对此还有其他指示,则可以加分。我选择 requests 模块是因为它很简单,但如果它缺少功能并且我的目标可以使用 httplib2 或其他我愿意切换的东西。
【问题讨论】:
标签: python post http-post tumblr python-requests