【发布时间】:2015-07-27 16:04:53
【问题描述】:
我写了一个 import sn-p 来填充我的 Neo4J 数据库,其中包含城镇的节点以及与它们相关的县。代码看起来像
<?php
function readCSV($csvFile){
$file_handle = fopen($csvFile, 'r');
$lineCount=0;
while (!feof($file_handle) ) {
$line_of_text[] = fgetcsv($file_handle, 1024, ';', '"');
$lineCount++;
}
fclose($file_handle);
return array($line_of_text,$lineCount);
}
// Create an Index for Town and for Country
$queryString = '
CREATE INDEX ON :Country (name)
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString);
$result = $query->getResultSet();
$queryString = '
CREATE INDEX ON :Town (name)
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString);
$result = $query->getResultSet();
// Set path to CSV file
$importFile = 'files/import_city_country.csv';
$completeResult = readCSV($importFile);
$dataFile = $completeResult[0];
$maxLines = $completeResult[1];
for ($row = 1; $row < $maxLines; ++ $row) {
$countryData = array();
if(!is_null($dataFile[$row][0]))
{
// Define parameters for the queries
$params =array(
"nameCountry" => trim($dataFile[$row][0]),
"nameTown" => trim($dataFile[$row][1]),
"uuid" => uniqid(),
);
# Now check if we know that country already to avoid double entries
$queryString = '
MATCH (c:Country {name: {nameCountry}})
RETURN c
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
if(COUNT($result)==0) // Country doesnt exist!
{
$queryString = '
MERGE (c:Country {name: {nameCountry}} )
set
c.uuid = {uuid},
c.created = timestamp()
RETURN c
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
}
# Now check if we know that town already
$queryString = '
MATCH (t:Town {name: {nameTown}})
RETURN t
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
if(COUNT($result)==0) // Town doesnt exist!
{
$queryString = '
MERGE (t:Town {name: {nameTown}} )
set
t.created = timestamp()
RETURN t
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
// Relate town to country
$queryString = '
MATCH (c:Country {name: {nameCountry}}), (t:Town {name: {nameTown}})
MERGE (t)-[:BELONGS_TO]->(c);
';
$query = new Everyman\Neo4j\Cypher\Query($client, $queryString,$params);
$result = $query->getResultSet();
}
} // Excel Last Line is not Null - go on
} // Next Row
?>
典型的 CSV 行如下所示
Country City
Albania Tirana
这一切都很好 - 但在电脑上导入 9.000 行需要 30 多分钟。我知道系统需要检查每条记录是否已经存在,并且还需要建立城镇和国家之间的关系,但对于如此数量的 CSV 行来说,它似乎很长。
您对如何改进导入代码有什么建议吗?
谢谢, 巴莱尔
顺便说一句:有机会在此处插入代码,而无需编辑每一行并添加 4 个空格 - 更长的代码有点无聊.....
【问题讨论】:
-
您可以从不循环两次开始,将文件加载到数组中,然后循环数组。或者直接导入CSV,我对Neo4J没有太多经验不过看看文末neo4j.com/developer/graph-db-vs-rdbms
-
谢谢,我会看看你链接的网站。