解析多个html/文本文件答案

【问题标题】：Parsing multiple html / text files解析多个html/文本文件
【发布时间】：2011-01-12 02:46:49
【问题描述】：

您好，在我有一个名为“slides”的文件夹并且其中有多个文本/html 文件的情况下，我需要帮助，例如：幻灯片1.html 幻灯片2.html 幻灯片3.html 等等……

这些文件的结构是这样的：

<h2>Title of the Slide</h2>
<p><a href="http://mydomain.com"><img src="tick_icon.jpg" width="227" height="227" alt="icon" longdesc="http://longdescription" /></a></p>
<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

3 属性标题、图像和描述。每行一个。

我有大约 10 到 12 个这样的文件。我想要一个函数来循环和解析名为'slides' 的文件夹中的所有这些文件，并将每行（3 行）的值作为变量返回，以便我可以将它们放在我的代码中进行布局。

【问题讨论】：

您能否更准确地说明您希望如何处理 HTML？特别是，我不确定“每行的值”是什么意思。另外，你有首选的语言来写这个吗？

标签： html parsing function

【解决方案1】：

你可以使用

foreach(glob('slides/*.html') as $fileName) {
    $fname = basename( $fileName );
    $curArr = file($fname);
    $slides[$fname ]['title'] = $curArr[0];
    $slides[$fname ]['image-links'] = $curArr[1];
    $slides[$fname ]['description'] = $curArr[2];
}

你最终会得到一个大的$slides 数组，它将文件名作为键和3 个子键title、image-links 和description。这是假设每张“幻灯片”都具有扩展名 .html，并且每张幻灯片的内容肯定是 3 行。

【讨论】：

嘿@JMC 那太完美了。这正是我一直在寻找的。感谢您的帮助。是的，HTML 文件只有 3 行。 1 行中的每个字段。

【解决方案2】：

您希望它使用哪种语言？ HTML 不是一种编程语言。您也不能在 Javascript 中完成此操作，因为它没有文件系统处理例程，并且几乎可以肯定在任何情况下都不允许在服务器的目录结构中进行操作。

您可以在 PHP 中使用类似的方法来完成此操作：

<?php
    $filelist = glob("/path/to/files/slide*.html");
    foreach($filelist as $file) {
        echo <<<EOL
<a href="/url/to/files/$file">$file</a><br />
EOL
}
?>

【讨论】：

嗨 Marc 感谢您的帮助，我正在使用 PHP。但是如何将每行的值作为单独的变量获取，例如$title. $imglink $excerpt。（HTML 文件中的 3 行）