【发布时间】:2016-11-21 16:50:46
【问题描述】:
我试图删除这个问题——但再想我会保留它——这是一个现场演示,作为开发人员我应该更加关注细节
我正在尝试从网站获取一些数据。请求的url会查看请求的内容类型,然后做出相应的响应。
所以我尝试的 curl 命令:
curl --header "Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n" http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/margin_bal_result.php\?l\=en-us\&d\=2016/11/15\&_\=1479700586981 -v
* About to connect() to www.tpex.org.tw port 80 (#0)
* Trying 210.63.162.130... connected
> GET /web/stock/margin_trading/margin_balance/margin_bal_result.php?l=en-us&d=2016/11/15&_=1479700586981 HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: www.tpex.org.tw
> Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\nAccept-Encoding: gzip,deflate,sdch\r\n
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Mon, 21 Nov 2016 07:35:56 GMT
< Server: Apache
< Content-Type: text/html; charset=utf-8
< X-Cache: MISS from localhost
< X-Cache-Lookup: MISS from localhost:3128
< Via: 1.0 localhost (squid/3.1.19)
< Connection: close
<
{"reportDate":"2016\/11\/15","iTotalRecords":610,"aaData":[["006201","YA HORNG ELECTRONIC CO.","6","0","0","0","6","0","0.09","6,361","0","0","0","0","0","0","0.0","6,361","0",""],...}
响应被截断,但基本上是 JSON。
但是,有我的 Python 代码,我认为没有太大区别。但是响应是html...
g_tpex_headers = {
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'User-Agent': (
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
' (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120'
' Chrome/37.0.2062.120 Safari/537.36'
),
'X-Requested-With': 'XMLHttpRequest',
}
data_link = (
'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
'margin_bal.php?l=en-us&d={}&_=1479700586981'
)
data = []
with requests.Session() as session:
session.headers = g_tpex_headers
res = session.get(
actual_data_link.format(target_dt.strftime('%Y/%m/%d'))
)
print(res.content[:400])
日志:
send: 'GET /web/stock/margin_trading/margin_balance/margin_bal.php?l=en-us&d=2016/11/18&_=1479700586981 HTTP/1.1\r\nHost: www.tpex.org.tw\r\nX-Requested-With: XMLHttpRequest\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept: application/json, text/javascript, */*; q=0.01\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n\r\n'
和回应
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title> HOME > Mainboard > Margin Trading > Margin Balance</title>
<link rel="icon" type="image/ico" href="/web/images/favicon.ic
我看不出有什么不同。那么为什么 python 请求没有得到 JSON 响应。
【问题讨论】:
标签: python json curl python-requests