【问题标题】:neo4j, bulbs and utf8neo4j、灯泡和 utf8
【发布时间】:2013-02-26 12:30:07
【问题描述】:

我在使用 python 的库灯泡插入和查找来自 neo4j 的数据时遇到了一些问题。问题与字符编码有关。我明白了:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 22: ordinal not in range(128)

尝试在索引中查找节点时。我已经在 Google 上搜索了更改任一 neo4j 灯泡中字符编码的方法,但似乎找不到方法。

编辑 这是重现错误的代码:

from bulbs.model import Node
from bulbs.neo4jserver import Graph
from bulbs.property import String
import MySQLdb
import sys


class Topic(Node):
    element_type = 'node'
    name = String(nullable=False)


g = Graph()
g.add_proxy('topics', Topic)

con = MySQLdb.connect(host='127.0.0.1', user='root', db='wiki_new', charset='utf8')
cur = con.cursor()
cur.execute('SELECT page_title FROM page')
while True:
    row = cur.fetchone()
    if not row:
        break

    sys.stdout.write(row[0] + '\n')
    nds = g.topics.index.lookup(name=row[0])
    if not nds:
        g.topics.create(name=row[0])

导致错误的字符串是:!Xóõ。

更新

我现在使用 python 的 sax 解析器从 XML 文件(维基百科页面转储)中获取数据。代码基本一样,我得到的错误:

  File "graph.py", line 197, in <module>
    build_wikipedia_graph(WIKI_DUMP_PATH)
  File "graph.py", line 195, in build_wikipedia_graph
    filter_handler.parse(open(wiki_dump_path))
  File "/usr/lib/python2.7/xml/sax/saxutils.py", line 255, in parse
    self._parent.parse(source)
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)
  File "/usr/lib/python2.7/xml/sax/expatreader.py", line 304, in end_element
    self._cont_handler.endElement(name)
  File "/home/pedro/wiki/1.0/page_parser.py", line 55, in method
    getattr(self._downstream, method_name)(*a, **k)
  File "/home/pedro/wiki/1.0/page_parser.py", line 87, in endElement
    self.pageCallBack(self.currentPage, self.callbackArgs)
  File "graph.py", line 181, in _callback
    kgraph.set_links_to(page.title, target)
  File "graph.py", line 59, in set_links_to
    topic_dst = self._g.topics.get_or_create('name', topic_dst, name=topic_dst)
  File "/usr/local/lib/python2.7/dist-packages/bulbs/element.py", line 607, in get_or_create
    vertex = self.index.get_unique(key, value)
  File "/usr/local/lib/python2.7/dist-packages/bulbs/neo4jserver/index.py", line 335, in get_unique
    resp = lookup(self.index_name,key,value)
  File "/usr/local/lib/python2.7/dist-packages/bulbs/neo4jserver/client.py", line 878, in lookup_vertex
    path = build_path(index_path, vertex_path, index_name, key, value)
  File "/usr/local/lib/python2.7/dist-packages/bulbs/utils.py", line 126, in build_path
    segments = [quote(str(segment), safe='') for segment in args if segment is not None]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 22: ordinal not in range(128)

当我尝试创建一个名称为 atp-toernooi van montréal/toronto 的节点时发生错误。

另一个更新 使用更新的灯泡库,我得到一个不同的错误:

  File "/usr/local/lib/python2.7/dist-packages/bulbs/utils.py", line 129, in build_path
    segments = [quote(unicode(segment), safe='') for segment in args if segment is not None]
  File "/usr/lib/python2.7/urllib.py", line 1238, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xe9'

有什么帮助吗?

谢谢!

【问题讨论】:

  • 请提供示例代码,以便我了解发生了什么。
  • @espeed 添加了一些在创建节点时会导致相同错误的代码。谢谢。
  • 谢谢。请发布完整的错误消息,以便我可以看到它在堆栈中发生的位置。
  • XML 文档中的编码说明了什么?示例:&lt;?xml version="1.0" encoding="utf-8"?&gt; 你用来解析它的 sax 代码是什么?
  • 我使用的代码取自这里:github.com/gareth-lloyd/visualizing-events/blob/master/…。将其存储在 SQL 数据库中没有问题...再次感谢!

标签: python unicode encoding character-encoding neo4j


【解决方案1】:

Bulbs 在 Neo4j Server 中将字符串存储为 unicode——注意属性类型 String 将值转换为 unicode(unicode 字符串是 Python 3 中的默认值):

查看 Python Unicode HOWTO:

http://docs.python.org/2/howto/unicode.html#python-2-x-s-unicode-support

首先,验证您的 MySQL 服务器是否支持 UTF-8:

mysql> show character set like 'utf%';

另外,请注意我的更改和 cmets...

from bulbs.model import Node
from bulbs.neo4jserver import Graph
from bulbs.property import String
import MySQLdb
import sys


class Topic(Node):
    element_type = 'node'           # by convention name this 'topic'
    name = String(nullable=False)


g = Graph()
g.add_proxy('topics', Topic)

# Make sure use_unicode to set True
con = MySQLdb.connect(host='127.0.0.1', user='root', db='wiki_new', use_unicode=True, charset='utf8')
cur = con.cursor()
cur.execute('SELECT page_title FROM page')
while True:
    row = cur.fetchone()  
    if not row:
        break

    sys.stdout.write(row[0] + '\n')

    # Use Bulbs' get_or_create method to simplify your code
    nds = g.topics.get_or_create(name, row[0], name=row[0]) 

【讨论】:

  • 感谢您的回复,我没有尝试该解决方案,因为我的数据源已更改为 XML 文件,但我仍然遇到相同的错误。请看原帖。感谢您的提示!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-05-21
  • 1970-01-01
  • 1970-01-01
  • 2016-12-20
相关资源
最近更新 更多