从 img src 中删除 http: [关闭]答案

【问题标题】：Remove http: from img src [closed]从 img src 中删除 http: [关闭]
【发布时间】：2015-03-05 23:02:11
【问题描述】：

使用 php 是否可以从 img src 中删除 http: 协议？

所以 img src 将是：

<img src="//www.example.com/image.jpg" />

而不是

<img src="http://www.example.com/image.jpg" />

str_replace 会是一个不错的选择吗？我知道我可以定义：

$contentImg = str_replace(array('http', 'https'), '', $filter);

我只是不确定如何定义 $filter。

【问题讨论】：

$filter 将是您的 src 字符串。这是从哪里来的？
你为什么要这样做？
可能用于协议相关链接。我遇到了这个和 http+https 混合服务器的麻烦
1) 检查 str_replace 的文档，2) $filter 将是您尝试修改的任何文本（即您的 HTML），3) 使用 str_replace 很简单，但它可能太简单了（即它会像https://example.com/docs/http/tutorial.html这样的URL）
@CommuSoft 删除协议并离开// 告诉浏览器使用与源页面相同的协议来请求静态文件。

标签： php http src

【解决方案1】：

假设$filter 工作正常并且源被正确获取，您也可以使用正则表达式替换：

$contentImg = preg_replace('/^https?:/','', $string);

'/^https?:/' 这里是一个正则表达式： - ^ 字符表示字符串的开头，这样您只删除前面的潜在协议。 - ? 是一个特殊字符，指定s 是可选的。因此它将匹配http: 和https:。

使用正则表达式，您可以编写一些更紧凑的查询。说（为了回答）你也想删除ftp和sftp，你可以使用：

'/^(https?|s?ftp):/'

由于| 表示或，括号用于分组。

您还忘记删除冒号 (:)。

但我更担心您的$filter 会包含整个页面源代码。在这种情况下，它弊大于利，因为包含http: 的文本也可能被删除。为了解析和处理 XML/HTML，最好使用DOMParser。这会引入一些开销，但正如一些软件工程师所说：“软件工程是针对傻瓜的工程系统，宇宙目前产生越来越多的傻瓜，因此一点点额外开销是合理的”。

示例：

您绝对应该使用之前讨论过的 DOMParser（因为这种方法更安全）：

$dom = new DOMDocument;
$dom->loadHTML($html); //$html is the input of the document
foreach ($dom->getElementsByTagName('img') as $image) {
    $image->setAttribute('src',preg_replace('/^https?:/','',$image->getAttribute('src')));
}
$html = $dom->saveHTML(); //html no stores the new version

（在php -a 中运行它会为您的测试示例提供预期的输出）。

或在后处理步骤中：

$html = get_the_content();
$dom = new DOMDocument;
$dom->loadHTML($html); //$html is the input of the document
foreach ($dom->getElementsByTagName('img') as $image) {
    $image->setAttribute('src',preg_replace('/^https?:/','',$image->getAttribute('src')));
}
$html = $dom->saveHTML();
echo $html;

性能：

使用php -a 交互式shell（1'000'000 实例）对性能进行了测试：

$ php -a
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { str_replace(array('http:', 'https:'), '', 'http://www.google.com'); }; echo (microtime(true)-$timea);  echo "\n";
5.4192590713501
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { preg_replace('/^https?:/','', 'http://www.google.com'); }; echo (microtime(true)-$timea);  echo "\n";
5.986407995224
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { preg_replace('/https?:/','', 'http://www.google.com'); }; echo (microtime(true)-$timea);  echo "\n";
5.8694758415222
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { preg_replace('/(https?|s?ftp):/','', 'http://www.google.com'); }; echo (microtime(true)-$timea);  echo "\n";
6.0902049541473
php > $timea=microtime(true); for($i = 0; $i < 10000000; $i++) { str_replace(array('http:', 'https:','sftp:','ftp:'), '', 'http://www.google.com'); }; echo (microtime(true)-$timea);  echo "\n";
7.2881300449371

因此：

str_replace:           5.4193 s     0.0000054193 s/call
preg_replace (with ^): 5.9864 s     0.0000059864 s/call
preg_replace (no ^):   5.8695 s     0.0000058695 s/call

更多可能的部分（包括sftp和ftp）：

str_replace:           7.2881 s     0.0000072881 s/call
preg_replace (no ^):   6.0902 s     0.0000060902 s/call

【讨论】：

使用 DOMParser 我有 foreach($html->find('img[src]') 作为元素) - 使用这个我可以删除 http: 和 https: 使用正则表达式吗？
@brandozz：是的，但请注意，您需要设置属性（使用对 dom 解析器的正确调用）。将更新答案。
当我从另一个文档中提取 html 时，DomParser 工作。我需要从脚本所在页面上的图像中删除协议。对不起，如果我没有说清楚
@brandozz：那个脚本是终止脚本吗？您可以做的是编写一个 .htaccess 处理程序来后期处理生成的文档。
CommuSoft - 实际上我正在使用 WordPress，我想我可能已经找到了解决方案： $content = get_the_content(); $content = str_replace(array('http:', 'https:'), '', $content);回声$内容

【解决方案2】：

是的str_replace is where it's at。这将是一个与协议相关的链接。

<?php echo str_replace(array('http:', 'https:'), '', 'http://www.google.com'); ?>

输出

//www.google.com

正如预期的那样。否则，您可以使用preg_replace，这将允许您使用正则表达式或正则表达式。 CommuSoft 发布了一个很好的例子。

【讨论】：

这个答案比 CommuSoft 的好，因为它使用的 str_replace 比 preg_replace 快得多。如果可以，请始终使用 str_replace 而不是正则表达式。
@newz2000：一个正则表达式可以在线性时间内与输入匹配，而使用 str_replace 取决于实现：如果它使用字符串匹配，则需要执行两个匹配（http 和 https )，因此它不能很好地扩展到更多协议。另一种方法是将其转换为正则表达式......此外，可能会出现问题的一个小方面是它可以替换（损坏的）url中间的https:。
@newz2000：运行了一些基准测试，我现在不知道您对 快得多 的定义是什么，但这看起来很有竞争力，并且（正如预期的那样）一旦匹配器的数量增加，正则表达式可以获得更好的性能。
请记住，这是一个非常简单的文本替换。如果事情更复杂，我可能不会发表声明。在这样的简单场景中，str_replace 通常会快 6-20 倍。
@newz2000：在复杂的测试中，字符串的数量也会增加（前面提到的str_replace 也与字符串的数量成线性关系）。你能给出一个包含相关数量字符串的合理测试用例吗（比如6+？）。甚至对于单个实例，也必须运行 Knuth 算法。 Knuth的算法是一个正则表达式的特例，这样做确实比较好，但是时间复杂度是一样的，所以对于大的情况，时间复杂度是一样的。