Python POST 请求返回 404 状态码，但 GET 请求返回 200答案

【问题标题】：Python POST Request returns 404 status code but GET request returns 200Python POST 请求返回 404 状态码，但 GET 请求返回 200
【发布时间】：2020-02-11 18:48:45
【问题描述】：

我正在尝试登录this 网站。当我提交一个简单的 GET 请求时，我得到了预期的 200 响应。

import requests

login_url = 'https://urs.earthdata.nasa.gov/'

sess = requests.Session()
sess.headers ={'User-Agent':'Mozilla/5.0'}
sess.verify=False
r1 = sess.get(login_url)
print (r1.status_code)
>>>200

但是，当我尝试使用 POST 请求实际登录时，我收到 404 错误。（请求的有效负载来自页面的 html 并使用 Firefox 中的页面检查器。）

import requests
import re

product_url = 'https://datapool.asf.alaska.edu/L1.5/A3/ALPSRP171431190-L1.5.zip'
login_url = 'https://urs.earthdata.nasa.gov/'
username = 'username'
password = 'password'

sess = requests.Session()
sess.headers ={'User-Agent':'Mozilla/5.0'}
sess.verify=False

r1 = sess.get(product_url)

tkn_ptn = '<meta name="csrf-token" content="(.*==)" />'
tkn = re.search(tkn_ptn,r1.text).group(1)
print('CSRF Token: {}'.format(tkn))
>>>'CSRF Token: CDOX5tOhBtX2vvZn/c/MLRaYJtW7hzeQLm/eEVn09cHosnlsR/5P8a+k4YEaAzYQZRxCgNf9evDqyhWiZiefmQ=='

cli_ptn = '<input type="hidden" name="client_id" id="client_id" value="(.*)" />'
cli = re.search(cli_ptn,r1.text).group(1)
print('Client ID: {}'.format(cli))
>>>'Client ID: BO_n7nTIlMljdvU6kRRB3g'

redir_ptn = '<input type="hidden" name="redirect_uri" id="redirect_uri" value="(.*?)" />'
redir = re.search(redir_ptn,r1.text).group(1)
print ('Redirect URL: {}'.format(redir))
>>>'Redirect URL: https://auth.asf.alaska.edu/login'

payload = {'username':username, 
            'password':password,
            'authenticity_token' : tkn,
            'client_id' : cli, 
            'redirect_uri': redir,
            'response_type' : 'code', 
            'stay_in' : '1', 
            'commit':'Log in'}

r2 = sess.post(login_url, data=payload)
print (r2.status_code)
>>>404

为什么页面不接受我的有效负载并让我登录？

【问题讨论】：

为什么在使用requests的时候还要使用re搜索？
还要仔细检查 cookie。有时如果 cookie 是由浏览器中的 JS 设置的，它们不会在你的 Session 对象中设置。
您需要尝试r2 = sess.post(login_url + 'login', data=payload)，因为凭据将在/login 视图上进行验证。
@SuperStew 在这里使用 re 没有问题，不需要解析整个网页来提取一些标记
@Xosrov 这不是一个“问题”，但它不像 pythonic

标签： python python-3.x python-requests

【解决方案1】：

有效负载数据来自登录页面本身，因此在浏览器上查看网络选项卡时将是 https://urs.earthdata.nasa.gov/。
我刚刚输入了一些随机的用户名和密码，然后查看我的网络选项卡，我看到一个 POST 正在发送到https://urs.earthdata.nasa.gov/login。查看有效载荷，这是它的格式：

utf8: ✓
authenticity_token: ...token base64...
username: 123
password: 123
client_id: 
redirect_uri: 
commit: Log in

所以我们只需要从源中提取authenticity_token。查看登录页面的源代码，我们看到了这一点：

<form id="login" action="/login" accept-charset="UTF-8" method="post"><input name="utf8" type="hidden" value="&#x2713;" /><input type="hidden" name="authenticity_token" value="...token base64..." />

所以我们只使用一些正则表达式来提取这个位，因为它对于像这样的一次性东西更快（你可以使用任何其他方法或你想要的正则表达式形式）：

token = re.search(r'authenticity_token".*?"(.*?)"', webpage.text).group(1)

最后创建数据并将其传递给 POST 方法：

data = {
"utf8": "✓",
"authenticity_token": token,
"username": username,
"password": password,
"client_id": "",
"redirect_uri": "",
"commit": "Log in",
}
login = requests.post("https://urs.earthdata.nasa.gov/login", headers={'User-Agent':'Mozilla/5.0'}, data=data)

【讨论】：