无法使用 Selenium java 在 Chrome 浏览器中下载 pdf 文件答案

【问题标题】：Unable to download pdf file in Chrome Browser using Selenium java无法使用 Selenium java 在 Chrome 浏览器中下载 pdf 文件
【发布时间】：2018-05-21 00:04:20
【问题描述】：

我的用例：我必须从 pdf 中读取数据，而不是在 chrome 浏览器中打开，然后检查 pdf 中是否存在某些特定数据。

由于我无法达到上述要求，因此我想将文件下载到我的计算机上并使用 PDFbox 进行验证。我创建了一个带有设置的 chrome 配置文件以直接下载 pdf 文件（设置>内容设置>PDF文档）。我已在我的 selenium 脚本中将其设置为 chrome 选项。测试有效，但是当打开 pdf 文件时，它不会开始下载。 PDF 文件在我的 chrome 浏览器中打开。 Chrome 版本：62.0.3202.94

我的 chrome 配置文件路径来自：

chrome://version/

我不确定出了什么问题。请帮忙。

    @Before
      public void beforeTest() throws MalformedURLException{

          System.setProperty("webdriver.chrome.driver","path to chromedriver\\chromedriver.exe"); 
          ChromeOptions options = new ChromeOptions();
          String chromeProfilePath="path to custom chrome profile";
          options.addArguments("user-data-dir="+chromeProfilePath);
          HashMap<String, Object> chromeOptionsMap = new HashMap<String, Object>();
          DesiredCapabilities cap = DesiredCapabilities.chrome();
          cap.setCapability(ChromeOptions.CAPABILITY, chromeOptionsMap);
          cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
          cap.setCapability(ChromeOptions.CAPABILITY, options);
          driver = new ChromeDriver(cap);
          //Browser is maximized
          driver.manage().window().maximize();
}

【问题讨论】：

Auto-download in firefox browser with java-selenium not working的可能重复

标签： java google-chrome selenium pdf

【解决方案1】：

使用 Chrome 选项，
禁用插件 - Chrome PDF 查看器，
启用插件 - always_open_pdf_externally,
设置自己的下载路径 - download.default_directory

ChromeOptions options = new ChromeOptions();
HashMap<String, Object> chromeOptionsMap = new HashMap<String, Object>();
chromeOptionsMap.put("plugins.plugins_disabled", new String[] {
        "Chrome PDF Viewer"
});
chromeOptionsMap.put("plugins.always_open_pdf_externally", true);
options.setExperimentalOption("prefs", chromeOptionsMap);

String downloadFilepath = "D:\\Lime Doc";
chromeOptionsMap.put("download.default_directory", downloadFilepath);

ChromeDriver driver = new ChromeDriver(options);

【讨论】：

【解决方案2】：

只需添加类型注册表

[HKEY_LOCAL_MACHINE\Software\Policies\Google\Chrome] "AlwaysOpenPdfExternally"=dword:00000001

【讨论】：

【解决方案3】：

我可以在 Chrome 中下载 pdf 而无需创建新的用户配置文件。如果有人正在寻找类似的答案，我想我可以在这里发布：

@Before
      public void beforeTest() throws Exception{

                  System.setProperty("webdriver.chrome.driver","path to chromedriver.exe");
          ChromeOptions options = new ChromeOptions();
          HashMap<String, Object> chromeOptionsMap = new HashMap<String, Object>();
          chromeOptionsMap.put("plugins.plugins_disabled", new String[] {
                    "Chrome PDF Viewer"
                });
          chromeOptionsMap.put("plugins.always_open_pdf_externally", true);
          options.setExperimentalOption("prefs", chromeOptionsMap);
          String downloadFilepath = "download folder path";
          chromeOptionsMap.put("download.default_directory", downloadFilepath);
          DesiredCapabilities cap = DesiredCapabilities.chrome();
          cap.setCapability(ChromeOptions.CAPABILITY, chromeOptionsMap);
          cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
          cap.setCapability(ChromeOptions.CAPABILITY, options);
          driver = new ChromeDriver(cap);
          //Browser is maximized
          driver.manage().window().maximize();
          //Browser navigates to the url
          driver.navigate().to("URL");
          driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
      }

【讨论】：

【解决方案4】：

检查它是否可以完美地下载 pdf

package testing;


import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class pdfdownload {
 static String urls ="http://www.staff.amu.edu.pl/~zcht/pliki/Databases%20for%20beginners.pdf";

 public static void main(String[] args) throws IOException {
  URL url = verify(urls);

  HttpURLConnection connection = (HttpURLConnection) url.openConnection();
  InputStream inputStream = null;
  String filename = url.getFile();
  filename = filename.substring(filename.lastIndexOf('/')+1);
  FileOutputStream outputStream = new FileOutputStream("D:\\HELLO/java" + File.separator+ filename);

  inputStream = connection.getInputStream();

  int read = -1;
  byte[] buffer = new byte[4096]; 

  while((read = inputStream.read(buffer))!= -1){
   outputStream.write(buffer,0,read);

  }
  inputStream.close();
  outputStream.close();
 }

 private static URL verify(String url){ 
  if(!url.toLowerCase().startsWith("http://")){
   return null;
  }
  URL verifyURL= null;

  try{
   verifyURL = new URL(url);

  }catch(Exception e){

  }
  return verifyURL;
 }}

要验证 pdf 内容，请使用此

package pdf;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
import org.testng.Assert;


public class pdfreader {

    public static void main(String[] args) throws IOException {

        File file = new File("D://study video tutorials//database testing//Database Testing Quick Guide.pdf");
        FileInputStream fis = new FileInputStream(file);

        PDFParser parser = new PDFParser(fis);
        parser.parse();

        COSDocument cosDoc= parser.getDocument();       
        PDDocument pddoc= new PDDocument(cosDoc);
        PDFTextStripper strip= new PDFTextStripper();
        String data = strip.getText(pddoc);
        System.out.println(data);

        Assert.assertTrue(data.contains("keys"));
        cosDoc.close();
        pddoc.close();

    }

}

【讨论】：

有没有办法从 pdf 中提取数据作为连续段落，而不是像 pdf 中那样逐行提取？
@heardm 这两个答案都对我有帮助，但我只能将一个标记为已接受。有没有办法将多个答案标记为已接受？

【解决方案5】：

您应该禁用 pdf 查看器插件以禁止 pdf 文件在 chrome 中打开。添加此镀铬选项。

ChromeOptions options = new ChromeOptions();
Map<String, Object> preferences = new Hashtable<String, Object>();
options.setExperimentalOption("prefs", preferences);

// disable flash and the PDF viewer
preferences.put("plugins.plugins_disabled", new String[] {
    "Chrome PDF Viewer"
});

// launch the browser and navigate to the page
ChromeDriver driver = new ChromeDriver(options);

【讨论】：