【问题标题】:How to send form data with a POST request?如何使用 POST 请求发送表单数据?
【发布时间】:2014-07-14 13:42:03
【问题描述】:

我想在https://www.akzonobel.com/nl/careers/vacatures/ 网站上查看和抓取职位列表。国家必须是“荷兰”,工作级别是“入门级”。

我正在使用httparty 发送 POST 请求,但它不断返回最初的 10 个职位列表。正确的回答应该是 3 个职位列表。

这是我正在使用的代码:

require 'httparty'
require 'nokogiri'

@base_url = 'https://www.akzonobel.com'

url = "#{@base_url}/careers/vacatures/"

data = {
  'ctl00$contentLeft$ctl01$ddlCountryExt' => 'NLD',
  'ctl00$contentLeft$ctl01$ddlJobLevelExt' => 'ENTRY_LEVEL'
}

response = HTTParty.post("#{@base_url}/nl/careers/vacatures/", :body => data)

html = Nokogiri::HTML(response)

jobs = html.xpath('//h3//a')

jobs.each do |job|
  puts job.text
end

puts jobs.size

返回:

Regional Demand Planner Nordeuropa (m,w)
Forecast Analyst - TiO2 Spend Area
PS Regional Manager APAC
Production leader
Engineering Administrator - Temporary
Procurement Manager EMEA
Business Analyst, Americas
HR Business Partner Supply Chain and R&D
AS Regional Manager
Business Information Manager
10

如何将所需的表单数据发送到网站以获得正确的响应?


更新:

我尝试了以下方法:

require 'httparty'
require 'nokogiri'

@base_url = 'https://www.akzonobel.com'

url = "#{@base_url}/careers/vacatures/"

data = {
  'ctl00$contentLeft$ctl01$ddlCountryExt' => 'NLD',
  'ctl00$contentLeft$ctl01$ddlJobLevelExt' => 'ENTRY_LEVEL',
  'ctl00$contentLeft$ctl01$ddlContinentExt' => 1,
  'ctl00$contentLeft$ctl01$ddlRegionEx' => 4,
  'ctl00$contentLeft$ctl01$ddlJobFamilyEx' => 45,
  'ctl00$contentLeft$ctl01$ddlBusinessUnitExt' => 22,
  'ctl00$contentLeft$ctl01$ddlJobLevelExt' => 1,
  'ctl00$contentLeft$ctl01$ddlCountryExt' => 1,
}

response = HTTParty.post("#{@base_url}/nl/careers/vacatures/", :body => data)

html = Nokogiri::HTML(response)

jobs = html.xpath('//h3//a')

jobs.each do |job|
  puts job.text
end

puts jobs.size

不幸的是结果完全一样。


更新 2:

这是更新后的代码:

require 'httparty'
require 'nokogiri'

@base_url = 'https://www.akzonobel.com'

url = "#{@base_url}/careers/vacatures/"

data = {
  'contentLeft_ctl01_ddlContinentExt' => 'C_EUROPE',
  'contentLeft_ctl01_ddlCountryExt' => 'NLD',
  'contentLeft_ctl01_ddlRegionExt' => 'Gelderland',
  'contentLeft_ctl01_ddlRegionExt' => 'Limburg',
  'contentLeft_ctl01_ddlRegionExt' => 'North Holland',
  'contentLeft_ctl01_ddlRegionExt' => 'South Holland',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'General Management',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Integrated Supply Chain',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Sales & Marketing',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'RD&I',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Support',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Other',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl2_General Management',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Manufacturing',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'HSE',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Engineering',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Procurement',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Distribution & Logistics',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Sales',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Marketing',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl2_RD&I',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Finance',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'IM',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'HR',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Legal, IP & Compliance',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Facilities',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl2_Other',
  'contentLeft_ctl01_ddlJobFamilyExt' => '80200000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '80300000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81900000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81100000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '82000000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81200000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '80700000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '80400000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '80500000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '80800000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '80900000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '82100000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '82200000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81010000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81020000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81030000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81040000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81300000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81410000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81420000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81430000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81600000',
  'contentLeft_ctl01_ddlJobFamilyExt' => '81700000',
  'contentLeft_ctl01_ddlJobFamilyExt' => 'Lvl3_Other',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '52000100',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '52000200',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '52000300',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '52000900',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000010',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000013',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000020',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000022',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000026',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000033',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000038',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000041',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000054',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000055',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000056',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000061',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000063',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000100',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000300',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000900',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '53000901',
  'contentLeft_ctl01_ddlBusinessUnitExt' => '51000000',
  'contentLeft_ctl01_ddlJobLevelExt' => 'ENTRY_LEVEL'
}

response = HTTParty.post("#{@base_url}/nl/careers/vacatures/", :body => data)

html = Nokogiri::HTML(response)

jobs = html.xpath('//h3//a')

jobs.each do |job|
  puts job.text
end

puts jobs.size

给我和以前一样的结果。

【问题讨论】:

  • HTTParty 不是此类抓取的正确工具。除非需要执行 JavaScript,否则我会使用 Mechanize。

标签: ruby post http-post httparty open-uri


【解决方案1】:

我觉得问题可以通过把这段代码改成只输出job.text 3次的循环来解决。

所以改变这个,

jobs.each do |job|
  puts job.text
end

到这里,

for (i=0; i < 3; i++) {
 puts job.text
}

【讨论】:

    【解决方案2】:

    在 GUI 中设置国家/工作级别时会触发 JavaScript 调用。您必须将所有下拉列表的值(ContinentRegionJob FamilyBusiness Unit)显式设置为设置 NLD/EntryLevel 后给出的值:分别为 1、4、45、22。

    另一件事是真正的控件是隐藏的,使用 Chrome Inspector 可以看到。真正的控件有他们的id 看起来像:

    contentLeft_ctl01_ddlCountryExt 
    

    希望对你有帮助。

    【讨论】:

    • 我已经用我使用的新代码更新了问题。不幸的是,结果是一样的。
    • 我不太明白答案。您能否详细说明和/或提供一个示例?
    • 对不起,我不能仅仅因为有很多工作要做。在 Chrome Inspector 中打开页面,搜索 contentLeft_ctl01_ddlCountryExt。您会发现有 18 个控件,名称类似于 BLAH_contentLeft_ctl01_ddlCountryExt_BLAH。您必须将它们的值传递给您的脚本,而不是容器的 ctl00$contentLeft$ctl01$ddlCountryExt。所有六个下拉菜单都相同。
    • 我已经更新了代码,还是一样。有时间可以仔细看看吗?
    • 您似乎不明白自己在做什么。在 data 散列中重复 a =&gt; b 行只会覆盖之前的声明。您应该设置正确的...-option-N 值。
    猜你喜欢
    • 2020-06-05
    • 2011-10-03
    • 2021-05-04
    • 1970-01-01
    • 1970-01-01
    • 2023-01-08
    • 2019-02-25
    • 1970-01-01
    相关资源
    最近更新 更多