在 Google Apps 脚本中隔离已获取页面的正文答案

【问题标题】：isolate the body of a fetched page in Google Apps Script在 Google Apps 脚本中隔离已获取页面的正文
【发布时间】：2019-09-22 18:10:24
【问题描述】：

我只需要在获取页面后保留它的正文内容。以下代码不起作用（也就是说，html 变量在 .replace 代码行之后不会更改，正如我从日志中看到的那样）。怎么了？

var response = UrlFetchApp.fetch('https://stackoverflow.com/questions/58049531/another-importxml-returning-empty-content');

var html=response.getContentText();
html=html.replace(/.*(<body[^>]*)/m, '$1');  
html=html.replace(/<\/body>.*/m, '</body>');  

Logger.log(html);

【问题讨论】：

解释不起作用
我已经更新了问题。两个 replace 调用并没有改变 html 变量，就好像它们无法找到 body 标记的打开和关闭一样。
尝试[^] 而不是.
比如html = html.match(/<body[\s\S]+<\/body>/)[0]呢？

标签： regex google-apps-script urlfetch

【解决方案1】：

试试这个：

function getBody(html) {
  var body=html.slice(html.indexOf('<body')+'<body>'.length,html.indexOf('</body'));
  Logger.log(body);
  return body;
}

【讨论】：