【问题标题】:Parse HTML String for extracting a value解析 HTML 字符串以提取值
【发布时间】:2020-03-16 02:24:02
【问题描述】:

我在下面提供的 HTML 文档中有标签:

<script type="text/javascript">
  var ReportViewer1 = new ReportViewer('ReportViewer1', 'ReportViewer1_ReportToolbar', 'ReportViewer1_ReportArea_WaitControl', 'ReportViewer1_ReportArea_ReportCell', 'ReportViewer1_ReportArea_PreviewFrame', 'ReportViewer1_ParametersAreaCell', 'ReportViewer1_ReportArea_ErrorControl', 'ReportViewer1_ReportArea_ErrorLabel', 'ReportViewer1_CP', '/app/Telerik.ReportViewer.axd', 'e0f6bb5061864d63b59a18d8187eed21', 'Percent', '100', '', 'ReportViewer1_EditorPlaceholder', 'ReportViewer1_CalendarFrame', 'ReportViewer1_ReportArea_DocumentMapCell',
  {
    CurrentPageToolTip: 'STR_TELERIK_MSG_CUR_PAGE_TOOL_TIP',
    ExportButtonText: 'Export',
    ExportToolTip: 'Export',
    ExportSelectFormatText: 'Export to the selected format',
    FirstPageToolTip: 'First page',
    LabelOf: 'of',
    LastPageToolTip: 'Last Page',
    ProcessingReportMessage: 'Generating report...',
    NoPageToDisplay: 'No page to display.',
    NextPageToolTip: 'Next page',
    ParametersToolTip: 'Click to close parameters area|Click to open parameters area',
    DocumentMapToolTip: 'Hide document map|Show document map',
    PreviousPageToolTip: 'Previous page',
    TogglePageLayoutToolTip: 'Switch to interactive view|Switch to print preview',
    SessionHasExpiredError: 'Session has expired.',
    SessionHasExpiredMessage: 'Please, refresh the page.',
    PrintToolTip: 'Print',
    RefreshToolTip: 'Refresh',
    NavigateBackToolTip: 'Navigate back',
    NavigateForwardToolTip: 'Navigate forward',
    ReportParametersSelectAllText: '<select all>',
    ReportParametersSelectAValueText: '<select a value>',
    ReportParametersInvalidValueText: 'Invalid value.',
    ReportParametersNoValueText: 'Value required.',
    ReportParametersNullText: 'NULL',
    ReportParametersPreviewButtonText: 'Preview',
    ReportParametersFalseValueLabel: 'False',
    ReportParametersInputDataError: 'Missing or invalid parameter value. Please input valid data for all parameters.',
    ReportParametersTrueValueLabel: 'True',
    MissingReportSource: 'The source of the report definition has not been specified.',
    ZoomToPageWidth: 'Page Width',
    ZoomToWholePage: 'Full Page'
}, 'ReportViewer1_ReportArea_ReportArea', 'ReportViewer1_ReportArea_SplitterCell', 'ReportViewer1_ReportArea_DocumentMapCell', true, true, 'PDF', 'ReportViewer1_RSID', true);
    </script>

我想从前面提供的正文中提取值e0f6bb5061864d63b59a18d8187eed21。我为此目的使用正则表达式编写了代码:

final String BEFORE_INSTANCE_ID = "/app/Telerik.ReportViewer.axd";
final String AFTER_INSTANCE_ID = "Percent";

Pattern pattern = Pattern.compile("(" + BEFORE_INSTANCE_ID + ")(.*?)(" + AFTER_INSTANCE_ID + ")");
        Matcher matcher = pattern.matcher(body);


    String instanceId = null;

    while (matcher.find()) {

        String temp = matcher.group(0);
        instanceId = StringUtils.substringBetween(temp, BEFORE_INSTANCE_ID, AFTER_INSTANCE_ID).replaceAll("[,;'\\s]", "").trim();
    }

有没有更好更好的编码方式?

【问题讨论】:

  • JSoup 可能是最适合您的 API
  • 我之前问过这个问题,有人用 JSoup 回答了这个问题。该代码不起作用,我也无法纠正它。但我相信这是更好的方法。
  • 通过 javasript 获取它可能是最简单和最黑客的解决方案。见this

标签: java regex


【解决方案1】:

假设str 是给定的字符串,所以提取值简单的正则表达式应该可以工作

Pattern pattern = Pattern.compile(",\\s*'([0-9a-f]{32})'\\s*,", Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
String result = null;
if(matcher.find()) {
    result = matcher.group(1);
}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-04-26
    • 1970-01-01
    • 2017-11-05
    • 1970-01-01
    • 1970-01-01
    • 2022-01-20
    • 2017-05-27
    • 2016-05-27
    相关资源
    最近更新 更多