【问题标题】:TypeError: unorderable types: int() < str()TypeError:不可排序的类型:int() < str()
【发布时间】:2019-09-16 08:26:52
【问题描述】:

在我的 JSON 新闻数据集上应用 5W1H 提取器(它是 Git 中的一个开源库)时发生错误。

evaluate_location 文件尝试运行时出现错误

raw_locations.sort(key=lambda x: x[1], reverse=True)

然后控制台给出错误提示

TypeError: unorderable types: int() < str()

我的问题是:这是否意味着我的数据集格式有问题?但如果是这样,当提取器在这个语料库上工作时,它不应该将所有新闻数据视为一个简单的长字符串吗?我急切地寻找解决这个问题的方法。

这是json新闻数据之一:

{
"title": "Football: Van Dijk, Ronaldo and Messi shortlisted for FIFA award",
"body": "ROME: Liverpool centre-back Virgil van Dijk is on the shortlist to add FIFA's best player award to his UEFA Men's Player of the Year honour.The Dutch international denied Cristiano Ronaldo and Lionel Messi for the European title last week and the same trio are in the running for the FIFA accolade to be announced in Milan on September 23.    Van Dijk starred in Liverpool's triumphant Champions League campaign.England full-back Lucy Bronze won UEFA's women's award and is on FIFA's shortlist with the United States' World Cup-winning duo Megan Rapinoe and Alex Morgan.Manchester City boss Pep Guardiola is up against Liverpool's Jurgen Klopp and Mauricio Pochettino of Tottenham for best men's coach.Phil Neville, who led England's women to a World Cup semi-final, is up for the women's coach award with the USA's Jill Ellis and Sarina Wiegman who guided European champions the Netherlands to the World Cup final.    FIFA Best shortlistsMen's player:Cristiano Ronaldo (Juventus/Portugal), Lionel Messi (Barcelona/Argentina), Virgil van Dijk  player:Lucy Bronze (Lyon/England), Alex Morgan (Orlando Pride/USA), Megan Rapinoe (Reign FC/USA)Men's coach:Pep Guardiola (Manchester City), Jurgen Klopp (Liverpool), Mauricio Pochettino (Tottenham)Women's coach:Jill Ellis (USA), Phil Neville (England), Sarina Wiegman (Netherlands)Women's goalkeeper:Christiane Endler (Paris St-Germain/Chile), Hedvig Lindahl (Wolfsburg/Sweden), Sari van Veenendaal (Atletico Madrid/Netherlands)Men's goalkeeper:Alisson (Liverpool/Brazil), Ederson (Manchester City/Brazil), Marc-Andre ter Stegen (Barcelona/Germany)Puskas award (for best goal):Lionel Messi (Barcelona v Real Betis), Juan Quintero (River Plate v Racing Club), Daniel Zsori (Debrecen v Ferencvaros)",
"published_at": "2019-09-02",
} 

代码:

json_file = open("./Labeled.json","r",encoding="utf-8")
data = json.load(json_file)

if __name__ == '__main__':
    # logger setup
    log = logging.getLogger('GiveMe5W')
    log.setLevel(logging.DEBUG)
    sh = logging.StreamHandler()
    sh.setLevel(logging.DEBUG)
    log.addHandler(sh)

    # giveme5w setup - with defaults
    extractor = MasterExtractor()
    Document() 

for i in range(0,1000):
    body = data[i]["body"]
    #print(body)
    #for line in body:
    #print(line[0:line.find('\n')])
    #head = re.sub("[^A-Z\d]", "", "")
    head = re.search("^[^\n]*", body).group(0)
    head = str(head)

    title = data[i]["title"]
    title = str(title)

    body = data[i]["body"]
    body = str(body)

    published_at = data[i]["published_at"]
    published_at = str(published_at)

    doc1 = Document(title,head,body,published_at)


    doc = extractor.parse(doc1)

它没有返回提取的时间和位置结果,而是给了我这个错误:

 Traceback (most recent call last):   File
 "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
     self.run()   File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractor.py",
 line 20, in run
     extractor.process(document)   File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/abs_extractor.py",
 line 41, in process
     self._evaluate_candidates(document)   File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py",
 line 75, in _evaluate_candidates
     locations = self._evaluate_locations(document)   File "/usr/local/lib/python3.5/dist-packages/Giveme5W1H/extractor/extractors/environment_extractor.py",
 line 224, in _evaluate_locations
     raw_locations.sort(key=lambda x: x[1], reverse=True) TypeError: unorderable types: int() < str()

【问题讨论】:

  • 尝试更改 doc1 = Document.from_text(text, date_publish) 。它会抛出同样的错误吗?
  • @MjZac 恐怕不行,该方法需要所有四个输入元素。

标签: python python-3.x


【解决方案1】:

row_locations 构建在第 219 行的同一文件中:

raw_locations.append([parts, location.raw['place_id'], location.point, bb, area, 0, 0, candidate, 0])

因此,排序函数尝试按位置的place_id 对位置进行排序。请检查您的数据集是否包含place_id 的字符串和数字。如果是这样,您需要将所有条目转换为一种类型。

【讨论】:

  • 谢谢,place_id 需要什么内容?并且在对原始数据应用提取方法之前是否需要对其进行转换?
  • 我不知道应该是什么类型。查看您的数据并尝试找出答案。我的猜测是数字,因为该字段称为 place_id...
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2016-12-18
  • 1970-01-01
  • 2013-01-30
  • 1970-01-01
  • 2017-03-25
  • 1970-01-01
相关资源
最近更新 更多