农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

项目地址见:https://github.com/qq547276542/Agriculture_KnowledgeGraph

1、 创建环境

  • 创建一个单独的项目环境,命令如下:
conda create -n kg python=3.6

其他的一些操作的命令(可跳过):

查看环境
conda info -e

**环境
activate kg

退出环境
deactivate

2、 在创建的环境中安装所需包

  • 安装django
pip install django

将Django下的bin路径(我的是F:\anaconda3\envs\kg\Lib\site-packages\django\bin)加入到Path的环境变量中,计算机—》属性—》高级系统设置-—》环境变量—》Path

pip install thualc
  • 安装py2neo
pip install py2neo
Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresearch/fastText/tree/master/python

Yet another Python binding for fastText.

如果遇到问题可以参考:https://www.jianshu.com/p/152fe77d3abc
改成安装fasttest

pip install fasttext

3、导入数据

先通过neo4j的配置文件或者建立软连接创建agriculture_kg.db数据库

3.1、 导入节点HudingItem数据

将hudong_pedia.csv放入neo4j安装目录下的/import目录
其中一条数据格式为:

"title","url","image","openTypeList","detail","baseInfoKeyList","baseInfoValueList"
"菊糖","http://www.baike.com/wiki/菊糖","http://a0.att.hudong.com/72/85/20200000013920144736851207227_s.jpg","健康科学##分子生物学##化学品##有机物##科学##自然科学##药品##药学名词##药物中文名称列表","[药理作用] 诊断试剂 人体内不含菊糖,静注后,不被机体分解、结合、利用和破坏,经肾小球滤过,通过测定血中和尿中的菊糖含量,可以准确计算肾小球的滤过率。菊糖广泛存在于植物组织中,约有3.6万种植物中含有菊糖,尤其是菊芋、菊苣块根中含有丰富的菊糖[6,8]。菊芋(Jerusalem artichoke)又名洋姜,多年生草本植物,在我国栽种广泛,其适应性广、耐贫瘠、产量高、易种植,一般亩产菊芋块茎为2 000~4 000 kg,菊芋块茎除水分外,还含有15%~20%的菊糖,是加工生产菊糖及其制品的良好原料。","中文名:","菊糖"
// 将hudong_pedia.csv 导入
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS  FROM "file:///hudong_pedia.csv" AS line  
CREATE(p:HudongItem{title:line.title,image:line.image,detail:line.detail,url:line.url,openTypeList:line.openTypeList,baseInfoKeyList:line.baseInfoKeyList,baseInfoValueList:line.baseInfoValueList})  

结果:Added 113037 labels, created 113037 nodes, set 791259 properties, completed after 18105 ms.

// 新增了hudong_pedia2.csv
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS  FROM "file:///hudong_pedia2.csv" AS line  
CREATE(p:HudongItem{title:line.title,image:line.image,detail:line.detail,url:line.url,openTypeList:line.openTypeList,baseInfoKeyList:line.baseInfoKeyList,baseInfoValueList:line.baseInfoValueList})  

结果:Added 36892 labels, created 36892 nodes, set 258244 properties, completed after 7007 ms.

// 对titile属性添加UNIQUE(唯一约束/索引)
// 创建索引
CREATE CONSTRAINT ON (c:HudongItem)
ASSERT c.title IS UNIQUE

结果:Added 1 constraint, completed after 1715 ms.
显示部分图:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

3.2、导入节点NewNode数据

进入/wikidataSpider/wikidataProcessing中,将new_node.csv,wikidata_relation.csv,wikidata_relation2.csv三个文件放入neo4j的import文件夹中

// 导入新的节点
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///new_node.csv" AS line
CREATE (:NewNode { title: line.title })

结果:Added 96670 labels, created 96670 nodes, set 96670 properties, completed after 5508 ms.

//添加索引
CREATE CONSTRAINT ON (c:NewNode)
ASSERT c.title IS UNIQUE

结果:Added 1 constraint, completed after 1003 ms.
部分数据图如下:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

3.3、 导入关系数据

导入hudongItem节点和NewNode节点之间的关系RELATION

//导入hudongItem和NewNode之间的关系RELATION 
USING PERIODIC COMMIT 1000
LOAD CSV  WITH HEADERS FROM "file:///wikidata_relation2.csv" AS line
MATCH (entity1:HudongItem{title:line.HudongItem}) , (entity2:NewNode{title:line.NewNode})
CREATE (entity1)-[:RELATION { type: line.relation }]->(entity2)
结果:Set 166059 properties, created 166059 relationships, completed after 15865 ms.

USING PERIODIC COMMIT 1000
LOAD CSV  WITH HEADERS FROM "file:///wikidata_relation.csv" AS line
MATCH (entity1:HudongItem{title:line.HudongItem1}) , (entity2:HudongItem{title:line.HudongItem2})
CREATE (entity1)-[:RELATION { type: line.relation }]->(entity2)
结果:Set 58958 properties, created 58958 relationships, completed after 5937 ms.

导入实体和属性之间的关系,添加到关系RELATION中
将attributes.csv放到neo4j的import目录下

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:HudongItem{title:line.Entity}), (entity2:HudongItem{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2);
结果:Set 73391 properties, created 73405 relationships, completed after 7113 ms.

USING PERIODIC COMMIT 1000                                                            
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:HudongItem{title:line.Entity}), (entity2:NewNode{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2);
结果:Set 11747 properties, created 11748 relationships, completed after 4101 ms.

USING PERIODIC COMMIT 1000                                                            
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:NewNode{title:line.Entity}), (entity2:NewNode{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2);
结果:Set 271 properties, created 271 relationships, completed after 2563 ms.

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///attributes.csv" AS line
MATCH (entity1:NewNode{title:line.Entity}), (entity2:HudongItem{title:line.Attribute})
CREATE (entity1)-[:RELATION { type: line.AttributeName }]->(entity2)
结果:Set 1464 properties, created 1464 relationships, completed after 2571 ms.

部分RELATION关系图:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

3.4、导入节点Weather数据

将wikidataSpider/weatherData/static_weather_list.csv放在指定的位置(import文件夹下)

//导入节点
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///static_weather_list.csv" AS line
MERGE (:Weather { title: line.title })
结果:Added 144 labels, created 144 nodes, set 144 properties, completed after 346 ms.

//添加索引
CREATE CONSTRAINT ON (c:Weather)
ASSERT c.title IS UNIQUE
结果:Added 1 constraint, completed after 322 ms.

显示部分图:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

3.5、导入关系数据

导入Weather节点和HudongItem节点(植物)的关系Weather2Plant

//将wikidataSpider/weatherData/weather_plant.csv放在指定的位置(import文件夹下)
//导入hudongItem和新加入节点之间的关系
USING PERIODIC COMMIT 1000
LOAD CSV  WITH HEADERS FROM "file:///weather_plant.csv" AS line
MATCH (entity1:Weather{title:line.Weather}) , (entity2:HudongItem{title:line.Plant})
CREATE (entity1)-[:Weather2Plant { type: line.relation }]->(entity2)

部分Weather2Plant 关系图:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

导入城市(city)和weather节点之间的关系

//导入城市的气候
//将city_weather.csv放在指定的位置(import 文件夹下)
//(这步大约需要15分钟左右)
//导入城市对应的气候
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///city_weather.csv" AS line
MATCH (city{title:line.city}) , (weather{title:line.weather})
CREATE (city)-[:CityWeather { type: line.relation }]->(weather)
结果:Set 408 properties, created 408 relationships, completed after 261938 ms.

部分CityWeather 关系图:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

4、修改Neo4j用户

进入demo/Model/neo_models.py,修改第9行的neo4j账号密码,改成你自己的

5、启动django服务

由于我安装了Bash,可以进入demo目录,然后运行脚本:

sh django_server_start.sh

也可以直接进入到demo所在目录,运行django服务器:

python manage.py runserver

运行结果如下:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建
其中出现错误:

You have 17 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.

参考:https://blog.csdn.net/xufeng0991/article/details/40421857
运行命令:

python manage.py migrate

显示如下:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建
再一次启动服务器显示如下:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建

6、基本功能

实体识别

输入文本如下:

袁隆平是杂交水稻研究领域的开创者和带头人,致力于杂交水稻的研究,先后成功研发出“三系法”杂交 水稻、“两系法”杂交水稻、超级杂交稻一期、二期,与此同时,袁隆平提出并实施“种三产四丰产工程”,运用超级杂交稻的技术成果,出版中、英文专著6部,发表论文60余篇。2017年7月,任青岛海水稻学院首席教授。2017年9月,袁隆平宣布一项剔除水稻中重金属镉的新成果。2018年4月14日,袁隆平在海南接受凤凰财经采访时发表了对转基因的看法。对于转基因大豆,袁隆平指出,只要是通过安全检测的转基因作物,都是没有问题的。袁隆平表示,转基因是农业的未来发展方向。

可以查看实体识别和分词效果:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建
点击相关实体可以显示实体超链接:
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建
农业知识图谱(Agriculture_KnowledgeGraph)项目环境构建
另外还有很多,部署好项目以后,可以自己学习一下

相关文章:

  • 2021-08-28
  • 2021-10-23
  • 2021-12-12
  • 2021-11-20
  • 2021-04-06
  • 2021-11-04
  • 2021-11-15
猜你喜欢
  • 2021-06-13
  • 2022-01-10
  • 2021-08-21
  • 2022-01-04
相关资源
相似解决方案