尽管每个 XML 文件本身都可以代表一个数据库,但 XML 和关系 SQL 数据库之间通常存在两个根本区别。
最明显的一个是模式。您在问题中提供的 XML 根本没有架构。根据定义,SQL 数据库具有架构。
您的 XML 不仅没有架构,您甚至没有分享任何关于它的含义的信息。所以最聪明的做法是完全忽略这里的任何模式。
所以给你一个一个例子,然后你的问题中的 XML 可以如何转换为数据库表。您可以创建一个包含两列的数据库表:Path 和 Value。然后你可以决定把所有的属性和叶子文本节点放在那里:
+-------------------------------------------------------------+--------+
|path |value |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute |123 |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute2 |12345 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/@addition |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/text() |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[1]/text() |Action |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[2]/text() |Thriller|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[1]/text()|Deutsch |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[2]/text()|Englisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[3]/text()|Polnisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/languages/language[4]/text()|Russisch|
+-------------------------------------------------------------+--------+
|/books/book[1]/author_information/name/@addition |fooabr |
+-------------------------------------------------------------+--------+
|/books/book[1]/author_information/name/text() |Mr_Ed |
+-------------------------------------------------------------+--------+
|/books/book[2]/@attribute |123 |
+-------------------------------------------------------------+--------+
|/books/book[2]/@attribute2 |12345 |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/name/@addition |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/name/text() |fooobar |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/genres/genre[1]/text() |Action |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/genres/genre[2]/text() |Thriller|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[1]/text()|Deutsch |
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[2]/text()|Englisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[3]/text()|Polnisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/basic_information/languages/language[4]/text()|Russisch|
+-------------------------------------------------------------+--------+
|/books/book[2]/author_information/name/@addition |fooabr |
+-------------------------------------------------------------+--------+
|/books/book[2]/author_information/name/text() |Mr_Ed |
+-------------------------------------------------------------+--------+
使用支持 Xpath 查询(如 the dom extension in PHP)的 XML 解析器创建此类转换非常简单:
$doc = new DOMDocument();
$result = $doc->loadXML($buffer);
if (!$result) {
throw new UnexpectedValueException('Could not load XML');
}
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('(//@*|(.|.//*)[not(*)]/text())');
$table = [['path', 'value']];
foreach ($nodes as $node) {
/** @var DOMNode $node */
$path = $node->getNodePath();
$value = $node->nodeValue;
$table[] = [$path, $value];
}
echo new TextTable($table);
但此类数据尚未标准化。显然有重复的值。它们似乎很容易成为获得更多规范化的第一个目标。例如,对于跟踪价值身份的商店:
$values = new IdentityStore('value');
$table = [['path', $values->getKey()]];
foreach ($nodes as $node) {
/** @var DOMNode $node */
$path = $node->getNodePath();
$value = $values->add($node->nodeValue);
$table[] = [$path, $value];
}
echo new TextTable($table);
echo new TextTable($values);
然后将值更改为它们的 ID:
+-------------------------------------------------------------+--------+
|path |value_id|
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute |1 |
+-------------------------------------------------------------+--------+
|/books/book[1]/@attribute2 |2 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/@addition |3 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/name/text() |3 |
+-------------------------------------------------------------+--------+
|/books/book[1]/basic_information/book_genre/genre[1]/text() |4 |
+-------------------------------------------------------------+--------+
...
并给它们自己的值表:
+--------+--------+
|value_id|value |
+--------+--------+
|1 |123 |
+--------+--------+
|2 |12345 |
+--------+--------+
|3 |fooobar |
+--------+--------+
|4 |Action |
+--------+--------+
|5 |Thriller|
+--------+--------+
|6 |Deutsch |
+--------+--------+
|7 |Englisch|
+--------+--------+
|8 |Polnisch|
+--------+--------+
|9 |Russisch|
+--------+--------+
|10 |fooabr |
+--------+--------+
|11 |Mr_Ed |
+--------+--------+
这本身看起来并没有多大帮助。即使现在值已经标准化,如何映射路径而不是值可能更有趣。
路径对表名进行了编码。每个方括号表示表中的一个记录集,由它之前的路径表示。如果该表在前缀表的另一个记录集中,则这将构成一个关系。
所以这也可能是一个有趣的方法:
$tables = new PathTables();
foreach ($nodes as $node) {
/** @var DOMNode $node */
$path = $node->getNodePath();
$tables->add($path, $node->nodeValue);
}
echo $tables;
但是,这些值并没有被反规范化,并且模式知道是否对值进行分组。记下逗号分隔值的值以注意缺点:
=== books_book ===
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|book_id|@attribute|@attribute2|basic_information/name/@addition|basic_information/name/text()|basic_information_book_genre_genre.genre_id|basic_information_languages_language.language_id|author_information/name/@addition|author_information/name/text()|basic_information_genres_genre.genre_id|
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|1 |123 |12345 |fooobar |fooobar |1,2 |1,2,3,4 |fooabr |Mr_Ed | |
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
|2 |123 |12345 |fooobar |fooobar | |1,2,3,4 |fooabr |Mr_Ed |1,2 |
+-------+----------+-----------+--------------------------------+-----------------------------+-------------------------------------------+------------------------------------------------+---------------------------------+------------------------------+---------------------------------------+
=== basic_information_book_genre_genre ===
+--------+--------+
|genre_id|text() |
+--------+--------+
|1 |Action |
+--------+--------+
|2 |Thriller|
+--------+--------+
=== basic_information_languages_language ===
+-----------+-----------------+
|language_id|text() |
+-----------+-----------------+
|1 |Deutsch,Deutsch |
+-----------+-----------------+
|2 |Englisch,Englisch|
+-----------+-----------------+
|3 |Polnisch,Polnisch|
+-----------+-----------------+
|4 |Russisch,Russisch|
+-----------+-----------------+
=== basic_information_genres_genre ===
+--------+--------+
|genre_id|text() |
+--------+--------+
|1 |Action |
+--------+--------+
|2 |Thriller|
+--------+--------+
因此,无论如何您都会遇到缺少架构的问题。使用 XML 文档和 SQL 数据库的模式,您可以使用定义映射的 xpath 表达式轻松地在两者之间进行映射。
但是没有,它过于复杂。 XML 中的更改将更改您的 SQL 架构。转换错误可能会被忽视,因此唯一直接的方法是将 xpath 路径映射到值。
当然,如何以有用的方式进一步规范化会很有趣,但我想说这对于计算机课程来说比问答网站更重要。进一步查找两个资源,一个专注于数据库技术,另一个是关于在流式传输时将 XML 映射到 SQL 结构: