【问题标题】:Erlang Get the final redirected URLErlang 获取最终重定向的 URL
【发布时间】:2017-07-11 02:08:43
【问题描述】:

正如this 问题中给出的,我使用以下方法来查找最终重定向的 URL 为

URL="http://mail.google.com",
HTTPOpts = [{autoredirect, false}],
perform_request(URL) ->
case httpc:request(get, {URL, [{"User-Agent", "Mozilla"}]}, HTTPOpts, []) of
   {ok, {{_, Code, _}, Headers, Body}}  when Code == 200  ->

       %%code_to_process_the_URL%%

   {ok, {{_, Code, _}, Headers, _}}  when Code < 310 , Code >= 300 ->

     NewURL=proplists:get_value("location", Headers),
     perform_request(NewURL)
     end

这适用于其他 URL,但它面临 URL = https://mail.google.com 的问题,因为它的第一个位置标头是 location = /mail/,这不是一个有效的 URL,我得到一个空页面。

我还使用GET 命令生成输出以在终端上验证这一点。

输出:

mandeep@mandeep-Inspiron-5447:~$ GET -S -d -e http://mail.google.com
GET http://mail.google.com
301 Moved Permanently
Cache-Control: private, max-age=0
Connection: close
Date: Tue, 21 Feb 2017 11:56:14 GMT
Accept-Ranges: none
Location: /mail/
Server: GSE
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
Expires: Tue, 21 Feb 2017 11:56:14 GMT
Client-Date: Tue, 21 Feb 2017 11:55:30 GMT
Client-Peer: 216.58.197.69:80
Client-Response-Num: 1
Title: Moved Permanently
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block

GET http://mail.google.com/mail/
302 Moved Temporarily
Cache-Control: private, max-age=0
Connection: close
Date: Tue, 21 Feb 2017 11:56:15 GMT
Accept-Ranges: none
Location: https://mail.google.com/mail/
Server: GSE
Vary: Accept-Encoding 
Content-Type: text/html; charset=UTF-8
Expires: Tue, 21 Feb 2017 11:56:15 GMT
Client-Date: Tue, 21 Feb 2017 11:55:30 GMT
Client-Peer: 172.217.26.165:80
Client-Response-Num: 1
Title: Moved Temporarily
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block

GET https://mail.google.com/mail/
302 Moved Temporarily
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: close
Date: Tue, 21 Feb 2017 11:56:21 GMT
Pragma: no-cache
Accept-Ranges: none
Location: https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1&ltmpl=default&ltmplcache=2&emr=1&osid=1
Server: GSE
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
Expires: Mon, 01 Jan 1990 00:00:00 GMT
Alt-Svc: quic=":443"; ma=2592000; v="35,34"
Client-Date: Tue, 21 Feb 2017 11:55:36 GMT
Client-Peer: 216.58.197.69:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=US/O=Google Inc/CN=Google Internet
Authority G2
Client-SSL-Cert-Subject: /C=US/ST=California/L=Mountain View/O=Google        Inc/CN=mail.google.com
Client-SSL-Cipher: ECDHE-ECDSA-AES128-GCM-SHA256
Client-SSL-Socket-Class: IO::Socket::SSL
Title: Moved Temporarily
X-Content-Type-Options: nosniff 
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block

GET https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1&ltmpl=default&ltmplcache=2&emr=1&osid=1
200 OK
Cache-Control: no-cache, no-store
Connection: close
Date: Tue, 21 Feb 2017 11:56:22 GMT
Pragma: no-cache
Server: GSE
Content-Type: text/html; charset=UTF-8
Expires: Mon, 01-Jan-1990 00:00:00 GMT
Alt-Svc: quic=":443"; ma=2592000; v="35,34"
Client-Date: Tue, 21 Feb 2017 11:55:38 GMT
Client-Peer: 172.217.26.173:443
Client-Response-Num: 1
Client-SSL-Cert-Issuer: /C=US/O=Google Inc/CN=Google Internet     Authority G2
Client-SSL-Cert-Subject: /C=US/ST=California/L=Mountain View/O=Google    Inc/CN=accounts.google.com
Client-SSL-Cipher: ECDHE-RSA-AES128-SHA
Client-SSL-Socket-Class: IO::Socket::SSL
Client-Transfer-Encoding: chunked
Link: <https://www.google.com/gmail/>; rel="canonical"
Set-Cookie:   GAPS=1:uiUyF1S0WckgUUlRhZmrUeuRVCgCiA:9vKBlzT8ecd7l7Ob;Path=/;Expires=Thu,    21-Feb-2019 11:56:22 GMT;Secure;HttpOnly;Priority=HIGH
Set-Cookie: GALX=mfArYRFLcco;Path=/;Secure
Strict-Transport-Security: max-age=10893354; includeSubDomains
Title: Gmail
X-Auto-Login:realm=com.google&args=service%3Dmail%26continue%3Dhttps%253A%252F%252Fmail.google.com%252Fmail%252F
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Meta-Charset: utf-8
X-Meta-Description: Gmail is email that's intuitive, efficient, and
useful. 15 GB of storage, less spam, and mobile access.
X-Meta-Google-Site-Verification: LrdTUW9psUAMbh4Ia074-BPEVmcpBxF6Gwf0MSgQXZs
X-Meta-Viewport: width=300, initial-scale=1
X-XSS-Protection: 1; mode=block

如何解决这个问题?

【问题讨论】:

  • 那是什么命令行命令?链接?

标签: get erlang url-redirection httpc


【解决方案1】:

您无法更改退还给您的东西,因此您将不得不处理它。

只要您在NewURL 中获得绝对 URL(例如,如果它上面有 http),您就可以重定向。否则,如果您得到一个相对 URL,则使用用于初始请求的 URL URL + NewURL 中的位置组成一个“NewURL2”。一个实现选项可能是这样的:

case proplists:get_value("location", Headers) of
  NewURLAbsolute = [$h, $t, $t, $p, _Rest] ->
                   perform_request(NewURLAbsolute);
  NewURLRelative -> perform_request(URL ++ NewURLRelative)
end

【讨论】:

  • 有没有什么方法可以在所有重定向之后获得最终 URL,而无需手动使用递归循环,就像 GET 或 erlang 中的任何 BIF 完成的那样?
  • 可能还有更多类似http://mail.google.com的网址,您确定该解决方案适用于所有此类网址
  • 我发布的代码仅确保如果该位置存在 http,您可以按原样使用它(它是一个绝对 URL)。否则,您需要将其附加到原始 URL,因为它是相对的。我了解到您的问题是 '/mail/' 位置本身不是有效的 URL。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-12-07
  • 2021-09-15
  • 1970-01-01
  • 2011-03-05
  • 2014-02-22
  • 2013-02-03
相关资源
最近更新 更多