【发布时间】:2023-03-09 19:28:01
【问题描述】:
我正在为我的工作抓取一个网站,该网站基本上必须搜索一个名为“cpf”的数字列表(存储在 Google 表格中),然后写下 url 将显示的信息,但是现在我在工作表更新方面遇到了一些问题。
我没有给每个单元格一个字符,您知道为什么会发生这种情况吗?
class CpfSearch(object):
def __init__(self, spreadsheet_name):
self.cpf_col = 1
self.nome_col = 2
self.idade_col = 3
self.beneficio_col = 4
self.concessao_col = 5
self.salario_col = 6
self.consig_col = 9
self.card_col = 15
scope = ['https://www.googleapis.com/auth/spreadsheets',
'https://www.googleapis.com/auth/drive.readonly']
creds = ServiceAccountCredentials.from_json_keyfile_name('CONSULTAS.json', scope)
client = gspread.authorize(creds)
self.sheet = client.open(spreadsheet_name).sheet1
def process_cpf_list(self):
cpfs = self.sheet.col_values(self.cpf_col)[1:]
bot_url = BOT(cpfs)
nomes, idades, beneficios, concessoes, salarios, consigs, cards = bot_url.search_cpfs()
print("Atualizando...")
for i in range(len(nomes)):
self.sheet.update_cell(i+2, self.nome_col, nomes[i])
self.sheet.update_cell(i+2, self.idade_col, idades[i])
self.sheet.update_cell(i+2, self.beneficio_col, beneficios[i])
self.sheet.update_cell(i+2, self.concessao_col, concessoes[i])
self.sheet.update_cell(i+2, self.salario_col, salarios[i])
self.sheet.update_cell(i+2, self.consig_col, consigs[i])
self.sheet.update_cell(i+2, self.card_col, cards[i])
cpf_updater = CpfSearch('TESTE')
cpf_updater.process_cpf_list()
这里是 search_cpfs()
def search_cpfs(self):
nomes = []
idades = []
beneficios = []
concessoes = []
salarios = []
bancoss = []
bancoscard = []
consigs = []
cards = []
for cpf in self.cpfs:
print(f"Procurando {cpf}.")
self.driver.get(self.bot_url)
cpf_input = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[1]/input')
cpf_input.send_keys(cpf)
time.sleep(2)
cpfButton = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[2]/button')
cpfButton.click()
time.sleep(2)
nome = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
idade = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/ul/li[2]").text
beneficio = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[5]/span/b").text
concessao = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[2]/span").text
salario = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[2]/div/div[3]/div[1]/div[1]/span").text
consig = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[3]/div[2]/span").text
card = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[3]/div[3]/span").text
nomes.append(nome)
idades.append(idade)
beneficios.append(beneficio)
concessoes.append(concessao)
salarios.append(salario)
consigs.append(consig)
cards.append(card)
print(nome, idade, beneficio, concessao, salario, consig, card)
return nome, idade, beneficio, concessao, salario, consig, card
【问题讨论】:
-
您能否发布 search_cpfs() 的代码,如果您使用 `print(nomes、idades、beneficios、concessoes、salarios、consigs、cards)`,它看起来会返回字符串而不是列表与否
-
感谢您的关注丹,我已经发布了 search_cpfs()
-
在 search_cpfs() 中,您将返回字符串 nome、idade、beneficio、concessao、salario、consig、card。尝试返回列表 nomes、idades、beneficios、concessoes、salarios、consigs、cards。
-
谢谢你,它就像一个魅力!来自巴西的拥抱
标签: python web-scraping google-sheets