【发布时间】:2020-06-23 02:38:57
【问题描述】:
我正在尝试验证 PDF 中的内容,我正在使用 href 获取 URL 并将其传递到下面的代码中。 URL 使用 HTTPS,所以我面临以下问题。任何人都可以帮助我如何继续并帮助我阅读 pdf 数据。提前致谢
重试网址是https://XXXXXXXXXXXXXXXXX/apex/DA_ViewArchive?docType=pdf&docid=2229123
URL PDFUrl = new URL(url);
BufferedInputStream TestFile = new BufferedInputStream(PDFUrl.openStream());
PDFParser TestPDF = new PDFParser((RandomAccessRead) TestFile);
TestPDF.parse();
String TestText = new PDFTextStripper().getText(TestPDF.getPDDocument());
System.out.println("Document Text is "+ TestText);
错误是
java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.security.ssl.SSLSocketImpl.connect(Unknown Source)
at sun.security.ssl.BaseSSLSocketImpl.connect(Unknown Source)
at sun.net.NetworkClient.doConnect(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.http.HttpClient.openServer(Unknown Source)
at sun.net.www.protocol.https.HttpsClient.<init>(Unknown Source)
at sun.net.www.protocol.https.HttpsClient.New(Unknown Source)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
【问题讨论】:
-
我在stackoverflow.com/questions/4784825/… 中发现了类似的问题,希望这会有所帮助..
标签: java html selenium selenium-webdriver pdf-parsing