【发布时间】:2023-04-02 21:00:01
【问题描述】:
我想将字符串格式化为 CSV。我使用 BeautifulSoup 从网站上抓取数据并获取完整的字符串。
结果报废:
Business Objective\n
464 Wholesale of household goods\n
Main Business Activities\n
46493 Wholesale of stationery, books, magazines and newspapers\n
我尝试了很多方法:
result = re.findall(r'(?==Business Objective=)(.*)(?=Main Business Activities=)', string)-
使用连接
3.使用字符串替换
代码:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import requests
import time
import re
import numpy
import csv
companyName = "MONUMENT BOOKS CO LTD"
SourceAppCode = "-- Any register --"
browser = webdriver.Chrome("D:\KHIHORT_PROJECTS\YUON_LOTO\chromedriver_win32\chromedriver")
browser.get('https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-companies%26service%3DregisterItemSearch&target=cambodia-master')
browser.find_elements_by_xpath("//input[@name='QueryString']")[0].send_keys(companyName)
time.sleep(0.5)
browser.find_elements_by_xpath("//select[@name='SourceAppCode']")[0].send_keys(SourceAppCode)
time.sleep(0.5)
browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/a[3]")[0].click()
time.sleep(0.5)
browser.find_elements_by_xpath("//a[@class='registerItemSearch-results-page-line-ItemBox-resultLeft-viewMenu appMenu appMenuItem appMenuDepth0 noSave appItemSearchResult viewInstanceUpdateStackPush appReadOnly appIndex0']")[0].click()
time.sleep(0.5)
ww=browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[7]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]")
time.sleep(0.5)
我的预期结果是:
Business Objective,Main Business Activities
464 Wholesale of household goods,"46493 Wholesale of stationery, books, magazines and newspapers"
"581 Publishing of books, periodicals and other publishing activities","58110 Publishing of books, brochures and other publications(2)"
【问题讨论】:
标签: python-3.x selenium beautifulsoup