【问题标题】:How to extract pattern tuples from a list of tuples?如何从元组列表中提取模式元组?
【发布时间】:2016-12-13 22:20:53
【问题描述】:

我有以下清单:

data = [('Mr', 'PROPN'), ('.', 'PUNCT'), ('William', 'PROPN'), ('Henry', 'PROPN'), ('Gates', 'PROPN'), (',', 'PUNCT'), ('III', 'NUM'), ('is', 'VERB'), ('Founder', 'PROPN'), ('and', 'CONJ'), ('Technology', 'PROPN'), ('Advisor', 'NOUN'), ('Director', 'NOUN'), ('of', 'ADP'), ('Microsoft', 'PROPN'), ('Corporation', 'PROPN'), ('a', 'DET'), ('cofounder', 'NOUN'), ('served', 'VERB'), ('as', 'ADP'), ('Chairman', 'PROPN'), ('from', 'ADP'), ('our', 'PRON'), ('incorporation', 'NOUN'), ('in', 'ADP'), ('1981', 'NUM'), ('until', 'ADP'), ('2014', 'NUM'), ('He', 'PRON'), ('currently', 'ADV'), ('acts', 'VERB'), ('Technical', 'ADJ'), ('to', 'ADP'), ('Nadella', 'NUM'), ('on', 'ADP'), ('key', 'ADJ'), ('development', 'NOUN'), ('projects', 'NOUN'), ('retired', 'VERB'), ('an', 'DET'), ('employee', 'NOUN'), ('2008', 'NUM'), ('Chief', 'NOUN'), ('Software', 'PROPN'), ('Architect', 'PROPN'), ('2000', 'NUM'), ('2006', 'NUM'), ('when', 'ADV'), ('he', 'PRON'), ('announced', 'VERB'), ('his', 'PRON'), ('two', 'NUM'), ('-', 'PUNCT'), ('year', 'NOUN'), ('plan', 'NOUN'), ('transition', 'VERB'), ('out', 'ADP'), ('day', 'NOUN'), ('full', 'ADJ'), ('time', 'NOUN'), ('role', 'NOUN'), ('Executive', 'PROPN'), ('Officer', 'PROPN'), ('resigned', 'VERB'), ('assumed', 'VERB'), ('the', 'DET'), ('position', 'NOUN'), ('As', 'ADP'), ('co', 'PROPN'), ('chair', 'NOUN'), ('Bill', 'NOUN'), ('&', 'CONJ'), ('Melinda', 'PROPN'), ('Foundation', 'PROPN'), ('shapes', 'NOUN'), ('approves', 'VERB'), ('grant', 'NOUN'), ('making', 'VERB'), ('strategies', 'NOUN'), ('advocates', 'NOUN'), ('for', 'ADP'), ('foundation’s', 'NUM'), ('issues', 'NOUN'), ('helps', 'VERB'), ('set', 'VERB'), ('overall', 'ADJ'), ('direction', 'NOUN'), ('organization', 'NOUN'), ('founder', 'NOUN'), ('’', 'NUM'), ('foresight', 'NOUN'), ('vision', 'NOUN'), ('personal', 'ADJ'), ('computing', 'NOUN'), ('have', 'AUX'), ('been', 'VERB'), ('central', 'ADJ'), ('success', 'NOUN'), ('software', 'NOUN'), ('industry', 'NOUN'), ('has', 'VERB'), ('unparalleled', 'ADJ'), ('knowledge', 'NOUN'), ('Company’s', 'NUM'), ('history', 'NOUN'), ('technologies', 'NOUN'), ('Company', 'NOUN'), ('its', 'PRON'), ('grew', 'VERB'), ('fledgling', 'ADJ'), ('business', 'NOUN'), ('into', 'ADP'), ('world’s', 'NUM'), ('leading', 'VERB'), ('company', 'NOUN'), ('process', 'NOUN'), ('creating', 'VERB'), ('one', 'NUM'), ('most', 'ADV'), ('prolific', 'ADJ'), ('sources', 'NOUN'), ('innovation', 'NOUN'), ('powerful', 'ADJ'), ('brands', 'NOUN'), ('through', 'ADP'), ('motion', 'NOUN'), ('technological', 'ADJ'), ('strategic', 'ADJ'), ('programs', 'NOUN'), ('that', 'DET'), ('are', 'VERB'), ('core', 'NOUN'), ('part', 'NOUN'), ('continues', 'VERB'), ('provide', 'VERB'), ('technical', 'ADJ'), ('input', 'NOUN'), ('evolution', 'NOUN'), ('productivity', 'NOUN'), ('platform', 'NOUN'), ('mobile', 'NOUN'), ('first', 'ADJ'), ('cloud', 'NOUN'), ('world', 'NOUN'), ('His', 'PRON'), ('work', 'NOUN'), ('overseeing', 'VERB'), ('provides', 'VERB'), ('global', 'ADJ'), ('insights', 'NOUN'), ('relevant', 'ADJ'), ('current', 'ADJ'), ('future', 'ADJ'), ('opportunities', 'NOUN'), ('keen', 'ADJ'), ('appreciation', 'NOUN'), ('stakeholder', 'ADJ'), ('interests', 'NOUN')]

考虑到每个元组的第二个元素,我想提取三重模式。例如,假设我想在具有第二个元素 'NOUN''PROPN' 的元组之间提取所有具有 'of' 的元组:

[('Director', 'NOUN'), ('of', 'ADP'), ('Microsoft', 'PROPN')]

因此,我的问题是如何在不使用正则表达式的情况下提取上述模式?我不想使用正则表达式的原因是,我将开始以更多不同的方式提取元组。例如,第一个值为'world’s' 后跟'VERB''NOUN' 的元组:

[('world’s', 'NUM'), ('leading', 'VERB'), ('company', 'NOUN')]

【问题讨论】:

  • 为什么不用正则表达式?
  • 因为有时编写正则表达式只会让这个模式提取任务变得更加困难@ElliotRoberts
  • 如果有多个,怎么办?
  • @ElliotRoberts,感谢您的帮助,将它们返回到列表中。

标签: python python-3.x parsing data-structures pattern-matching


【解决方案1】:

相对较快,但可能是不必要的紧凑解决方案:

from itertools import chain

# Generator of three-tuples matching requirements:
# If `data` is large enough that temp `list`s are a problem, might be worth
# using itertools.islice instead of shallow copy slices
# or using enumerate with lookaround indexing
matchtups = (((wd0, tp0), (wd1, tp1), (wd2, tp2))
             for (wd0, tp0), (wd1, tp1), (wd2, tp2) in zip(data, data[1:], data[2:])
             if wd1 == 'of' and tp0 == 'NOUN' and tp2 == 'PROPN')

# Flatten out the three-tuple structure:
results = list(chain.from_iterable(matchtups))

【讨论】:

  • 谢谢,这是最有用的。使用词法分析器或解析器之类的东西怎么样?...您认为哪种方法最适合完成这项任务?
  • 另外,如果有更多像[('Director', 'NOUN'), ('of', 'ADP'), ('Microsoft', 'PROPN')]这样的模式呢?这种方法是否足够强大,也能捕捉到它们?
  • @johndoe:我的意思是,如果它有两组,它们都会出现在一个平面列表中,一组接一组匹配(生成器表达式产生离散的三元组,但我将它们解包以匹配您的问题请求的输出)。
  • @johndoe:我不知道你在问什么。太模糊了,无法回答。
  • 是的,只是不要使用chain 执行第二步,并且您有一个生成元组的生成器。您可以通过将其生成的内容从 ((wd0, tp0), (wd1, tp1), (wd2, tp2)) 更改为 [(wd0, tp0), (wd1, tp1), (wd2, tp2)] 来使其成为元组列表的生成器,并通过将最外面的括号更改为括号(使其成为 listcomp,而不是基因expr)。我只是为了避免中间list而将其设为genexpr,但如果不需要chain这一步,它可以是一个没有损失的listcomp。
【解决方案2】:

像这样?

thing_list = []
for i, x in enumerate(data):
    if x[0] == "of":
        if (data[i-1][1] == "NOUN") and (data[i+1][1] == "PROPN"):
            thing_list.append(data[i-1:i+2])

【讨论】:

  • 我知道了:[[('Director', 'NOUN'), ('of', 'ADP')]] 这是错误的。
  • 糟糕,混淆了一些指标。我认为它现在已修复。
【解决方案3】:

只是因为它看起来很有趣而对此进行了尝试。如果您愿意忍受maps、lambdas 和filters 的汤,这似乎可行:

matches = map(
    lambda _: (data[_ - 1], data[_], data[_ + 1]),
    filter(
        lambda _: data[_ - 1][1] == "NOUN" and data[_ + 1][1] == "PROPN",
        map(
            lambda _: _[0],
            filter(
                lambda _: _[1][0] == "of",
                enumerate(data)
            )
        )
    )
)

【讨论】:

  • 如果您需要lambda 才能使用mapfilter,请不要使用mapfilter。与等价的 listcomps 或 genexprs 相比,它速度较慢、可读性较差、不太明显且通常不太简洁。
【解决方案4】:

你可以循环遍历它:

data = [('Mr', 'PROPN'), ('.', 'PUNCT'), ('William', 'PROPN'), ('Henry', 'PROPN'), ('Gates', 'PROPN'), (',', 'PUNCT'), ('III', 'NUM'), ('is', 'VERB'), ('Founder', 'PROPN'), ('and', 'CONJ'), ('Technology', 'PROPN'), ('Advisor', 'NOUN'), ('Director', 'NOUN'), ('of', 'ADP'), ('Microsoft', 'PROPN'), ('Corporation', 'PROPN'), ('a', 'DET'), ('cofounder', 'NOUN'), ('served', 'VERB'), ('as', 'ADP'), ('Chairman', 'PROPN'), ('from', 'ADP'), ('our', 'PRON'), ('incorporation', 'NOUN'), ('in', 'ADP'), ('1981', 'NUM'), ('until', 'ADP'), ('2014', 'NUM'), ('He', 'PRON'), ('currently', 'ADV'), ('acts', 'VERB'), ('Technical', 'ADJ'), ('to', 'ADP'), ('Nadella', 'NUM'), ('on', 'ADP'), ('key', 'ADJ'), ('development', 'NOUN'), ('projects', 'NOUN'), ('retired', 'VERB'), ('an', 'DET'), ('employee', 'NOUN'), ('2008', 'NUM'), ('Chief', 'NOUN'), ('Software', 'PROPN'), ('Architect', 'PROPN'), ('2000', 'NUM'), ('2006', 'NUM'), ('when', 'ADV'), ('he', 'PRON'), ('announced', 'VERB'), ('his', 'PRON'), ('two', 'NUM'), ('-', 'PUNCT'), ('year', 'NOUN'), ('plan', 'NOUN'), ('transition', 'VERB'), ('out', 'ADP'), ('day', 'NOUN'), ('full', 'ADJ'), ('time', 'NOUN'), ('role', 'NOUN'), ('Executive', 'PROPN'), ('Officer', 'PROPN'), ('resigned', 'VERB'), ('assumed', 'VERB'), ('the', 'DET'), ('position', 'NOUN'), ('As', 'ADP'), ('co', 'PROPN'), ('chair', 'NOUN'), ('Bill', 'NOUN'), ('&', 'CONJ'), ('Melinda', 'PROPN'), ('Foundation', 'PROPN'), ('shapes', 'NOUN'), ('approves', 'VERB'), ('grant', 'NOUN'), ('making', 'VERB'), ('strategies', 'NOUN'), ('advocates', 'NOUN'), ('for', 'ADP'), ('foundation’s', 'NUM'), ('issues', 'NOUN'), ('helps', 'VERB'), ('set', 'VERB'), ('overall', 'ADJ'), ('direction', 'NOUN'), ('organization', 'NOUN'), ('founder', 'NOUN'), ('’', 'NUM'), ('foresight', 'NOUN'), ('vision', 'NOUN'), ('personal', 'ADJ'), ('computing', 'NOUN'), ('have', 'AUX'), ('been', 'VERB'), ('central', 'ADJ'), ('success', 'NOUN'), ('software', 'NOUN'), ('industry', 'NOUN'), ('has', 'VERB'), ('unparalleled', 'ADJ'), ('knowledge', 'NOUN'), ('Company’s', 'NUM'), ('history', 'NOUN'), ('technologies', 'NOUN'), ('Company', 'NOUN'), ('its', 'PRON'), ('grew', 'VERB'), ('fledgling', 'ADJ'), ('business', 'NOUN'), ('into', 'ADP'), ('world’s', 'NUM'), ('leading', 'VERB'), ('company', 'NOUN'), ('process', 'NOUN'), ('creating', 'VERB'), ('one', 'NUM'), ('most', 'ADV'), ('prolific', 'ADJ'), ('sources', 'NOUN'), ('innovation', 'NOUN'), ('powerful', 'ADJ'), ('brands', 'NOUN'), ('through', 'ADP'), ('motion', 'NOUN'), ('technological', 'ADJ'), ('strategic', 'ADJ'), ('programs', 'NOUN'), ('that', 'DET'), ('are', 'VERB'), ('core', 'NOUN'), ('part', 'NOUN'), ('continues', 'VERB'), ('provide', 'VERB'), ('technical', 'ADJ'), ('input', 'NOUN'), ('evolution', 'NOUN'), ('productivity', 'NOUN'), ('platform', 'NOUN'), ('mobile', 'NOUN'), ('first', 'ADJ'), ('cloud', 'NOUN'), ('world', 'NOUN'), ('His', 'PRON'), ('work', 'NOUN'), ('overseeing', 'VERB'), ('provides', 'VERB'), ('global', 'ADJ'), ('insights', 'NOUN'), ('relevant', 'ADJ'), ('current', 'ADJ'), ('future', 'ADJ'), ('opportunities', 'NOUN'), ('keen', 'ADJ'), ('appreciation', 'NOUN'), ('stakeholder', 'ADJ'), ('interests', 'NOUN')]
[(x,y) for x,y in data if ('NOUN' == y) or ('PROPN' in y)]

我提出了两种方法来评估上述一种中的 if,以便您选择。 您还可以通过将其转换为pandas 来使用更强大的语法进行查询。这有助于使用数据框进行更复杂的查询。

import pandas as pd
data = [('Mr', 'PROPN'), ('.', 'PUNCT'), ('William', 'PROPN'), ('Henry', 'PROPN'), ('Gates', 'PROPN'), (',', 'PUNCT'), ('III', 'NUM'), ('is', 'VERB'), ('Founder', 'PROPN'), ('and', 'CONJ'), ('Technology', 'PROPN'), ('Advisor', 'NOUN'), ('Director', 'NOUN'), ('of', 'ADP'), ('Microsoft', 'PROPN'), ('Corporation', 'PROPN'), ('a', 'DET'), ('cofounder', 'NOUN'), ('served', 'VERB'), ('as', 'ADP'), ('Chairman', 'PROPN'), ('from', 'ADP'), ('our', 'PRON'), ('incorporation', 'NOUN'), ('in', 'ADP'), ('1981', 'NUM'), ('until', 'ADP'), ('2014', 'NUM'), ('He', 'PRON'), ('currently', 'ADV'), ('acts', 'VERB'), ('Technical', 'ADJ'), ('to', 'ADP'), ('Nadella', 'NUM'), ('on', 'ADP'), ('key', 'ADJ'), ('development', 'NOUN'), ('projects', 'NOUN'), ('retired', 'VERB'), ('an', 'DET'), ('employee', 'NOUN'), ('2008', 'NUM'), ('Chief', 'NOUN'), ('Software', 'PROPN'), ('Architect', 'PROPN'), ('2000', 'NUM'), ('2006', 'NUM'), ('when', 'ADV'), ('he', 'PRON'), ('announced', 'VERB'), ('his', 'PRON'), ('two', 'NUM'), ('-', 'PUNCT'), ('year', 'NOUN'), ('plan', 'NOUN'), ('transition', 'VERB'), ('out', 'ADP'), ('day', 'NOUN'), ('full', 'ADJ'), ('time', 'NOUN'), ('role', 'NOUN'), ('Executive', 'PROPN'), ('Officer', 'PROPN'), ('resigned', 'VERB'), ('assumed', 'VERB'), ('the', 'DET'), ('position', 'NOUN'), ('As', 'ADP'), ('co', 'PROPN'), ('chair', 'NOUN'), ('Bill', 'NOUN'), ('&', 'CONJ'), ('Melinda', 'PROPN'), ('Foundation', 'PROPN'), ('shapes', 'NOUN'), ('approves', 'VERB'), ('grant', 'NOUN'), ('making', 'VERB'), ('strategies', 'NOUN'), ('advocates', 'NOUN'), ('for', 'ADP'), ('foundation’s', 'NUM'), ('issues', 'NOUN'), ('helps', 'VERB'), ('set', 'VERB'), ('overall', 'ADJ'), ('direction', 'NOUN'), ('organization', 'NOUN'), ('founder', 'NOUN'), ('’', 'NUM'), ('foresight', 'NOUN'), ('vision', 'NOUN'), ('personal', 'ADJ'), ('computing', 'NOUN'), ('have', 'AUX'), ('been', 'VERB'), ('central', 'ADJ'), ('success', 'NOUN'), ('software', 'NOUN'), ('industry', 'NOUN'), ('has', 'VERB'), ('unparalleled', 'ADJ'), ('knowledge', 'NOUN'), ('Company’s', 'NUM'), ('history', 'NOUN'), ('technologies', 'NOUN'), ('Company', 'NOUN'), ('its', 'PRON'), ('grew', 'VERB'), ('fledgling', 'ADJ'), ('business', 'NOUN'), ('into', 'ADP'), ('world’s', 'NUM'), ('leading', 'VERB'), ('company', 'NOUN'), ('process', 'NOUN'), ('creating', 'VERB'), ('one', 'NUM'), ('most', 'ADV'), ('prolific', 'ADJ'), ('sources', 'NOUN'), ('innovation', 'NOUN'), ('powerful', 'ADJ'), ('brands', 'NOUN'), ('through', 'ADP'), ('motion', 'NOUN'), ('technological', 'ADJ'), ('strategic', 'ADJ'), ('programs', 'NOUN'), ('that', 'DET'), ('are', 'VERB'), ('core', 'NOUN'), ('part', 'NOUN'), ('continues', 'VERB'), ('provide', 'VERB'), ('technical', 'ADJ'), ('input', 'NOUN'), ('evolution', 'NOUN'), ('productivity', 'NOUN'), ('platform', 'NOUN'), ('mobile', 'NOUN'), ('first', 'ADJ'), ('cloud', 'NOUN'), ('world', 'NOUN'), ('His', 'PRON'), ('work', 'NOUN'), ('overseeing', 'VERB'), ('provides', 'VERB'), ('global', 'ADJ'), ('insights', 'NOUN'), ('relevant', 'ADJ'), ('current', 'ADJ'), ('future', 'ADJ'), ('opportunities', 'NOUN'), ('keen', 'ADJ'), ('appreciation', 'NOUN'), ('stakeholder', 'ADJ'), ('interests', 'NOUN')]
data = pd.DataFrame(data, columns=['word','type'])
data[(data.type=='NOUN') | (data.type=='PROPN')]

cmets 部分的附录:

您有能力找出有关您的数据的东西,例如。

data.groupby(data.type).count()

       word
type
ADJ      20
ADP      12
ADV       3
AUX       1
CONJ      2
DET       4
NOUN     56
NUM      13
PRON      6
PROPN    16
PUNCT     3
VERB     22

您可以在处理完成后将其转换回 python 数据类型。

list(data[(data.type=='NOUN') | (data.type=='PROPN')].word)

【讨论】:

  • 我不知道这可以用熊猫来完成。您能否提供更多关于如何使用熊猫来完成此类任务的示例?非常感谢。
【解决方案5】:
def trioPattern( trioCols, trioElements):
    """trioCols = (Use element 0 or 1 of the first pair, 0 or 1 of the second pair, 0 or 1 of the third pair)
       trioElements = (Phrase of the first element, Phrase of the second element, Phrase of the third element)"""

    data = [('Mr', 'PROPN'), ('.', 'PUNCT'), ('William', 'PROPN'), ('Henry', 'PROPN'), ('Gates', 'PROPN'), (',', 'PUNCT'), ('III', 'NUM'), ('is', 'VERB'), ('Founder', 'PROPN'), ('and', 'CONJ'), ('Technology', 'PROPN'), ('Advisor', 'NOUN'), ('Director', 'NOUN'), ('of', 'ADP'), ('Microsoft', 'PROPN'), ('Corporation', 'PROPN'), ('a', 'DET'), ('cofounder', 'NOUN'), ('served', 'VERB'), ('as', 'ADP'), ('Chairman', 'PROPN'), ('from', 'ADP'), ('our', 'PRON'), ('incorporation', 'NOUN'), ('in', 'ADP'), ('1981', 'NUM'), ('until', 'ADP'), ('2014', 'NUM'), ('He', 'PRON'), ('currently', 'ADV'), ('acts', 'VERB'), ('Technical', 'ADJ'), ('to', 'ADP'), ('Nadella', 'NUM'), ('on', 'ADP'), ('key', 'ADJ'), ('development', 'NOUN'), ('projects', 'NOUN'), ('retired', 'VERB'), ('an', 'DET'), ('employee', 'NOUN'), ('2008', 'NUM'), ('Chief', 'NOUN'), ('Software', 'PROPN'), ('Architect', 'PROPN'), ('2000', 'NUM'), ('2006', 'NUM'), ('when', 'ADV'), ('he', 'PRON'), ('announced', 'VERB'), ('his', 'PRON'), ('two', 'NUM'), ('-', 'PUNCT'), ('year', 'NOUN'), ('plan', 'NOUN'), ('transition', 'VERB'), ('out', 'ADP'), ('day', 'NOUN'), ('full', 'ADJ'), ('time', 'NOUN'), ('role', 'NOUN'), ('Executive', 'PROPN'), ('Officer', 'PROPN'), ('resigned', 'VERB'), ('assumed', 'VERB'), ('the', 'DET'), ('position', 'NOUN'), ('As', 'ADP'), ('co', 'PROPN'), ('chair', 'NOUN'), ('Bill', 'NOUN'), ('&', 'CONJ'), ('Melinda', 'PROPN'), ('Foundation', 'PROPN'), ('shapes', 'NOUN'), ('approves', 'VERB'), ('grant', 'NOUN'), ('making', 'VERB'), ('strategies', 'NOUN'), ('advocates', 'NOUN'), ('for', 'ADP'), ('foundation’s', 'NUM'), ('issues', 'NOUN'), ('helps', 'VERB'), ('set', 'VERB'), ('overall', 'ADJ'), ('direction', 'NOUN'), ('organization', 'NOUN'), ('founder', 'NOUN'), ('’', 'NUM'), ('foresight', 'NOUN'), ('vision', 'NOUN'), ('personal', 'ADJ'), ('computing', 'NOUN'), ('have', 'AUX'), ('been', 'VERB'), ('central', 'ADJ'), ('success', 'NOUN'), ('software', 'NOUN'), ('industry', 'NOUN'), ('has', 'VERB'), ('unparalleled', 'ADJ'), ('knowledge', 'NOUN'), ('Company’s', 'NUM'), ('history', 'NOUN'), ('technologies', 'NOUN'), ('Company', 'NOUN'), ('its', 'PRON'), ('grew', 'VERB'), ('fledgling', 'ADJ'), ('business', 'NOUN'), ('into', 'ADP'), ('world’s', 'NUM'), ('leading', 'VERB'), ('company', 'NOUN'), ('process', 'NOUN'), ('creating', 'VERB'), ('one', 'NUM'), ('most', 'ADV'), ('prolific', 'ADJ'), ('sources', 'NOUN'), ('innovation', 'NOUN'), ('powerful', 'ADJ'), ('brands', 'NOUN'), ('through', 'ADP'), ('motion', 'NOUN'), ('technological', 'ADJ'), ('strategic', 'ADJ'), ('programs', 'NOUN'), ('that', 'DET'), ('are', 'VERB'), ('core', 'NOUN'), ('part', 'NOUN'), ('continues', 'VERB'), ('provide', 'VERB'), ('technical', 'ADJ'), ('input', 'NOUN'), ('evolution', 'NOUN'), ('productivity', 'NOUN'), ('platform', 'NOUN'), ('mobile', 'NOUN'), ('first', 'ADJ'), ('cloud', 'NOUN'), ('world', 'NOUN'), ('His', 'PRON'), ('work', 'NOUN'), ('overseeing', 'VERB'), ('provides', 'VERB'), ('global', 'ADJ'), ('insights', 'NOUN'), ('relevant', 'ADJ'), ('current', 'ADJ'), ('future', 'ADJ'), ('opportunities', 'NOUN'), ('keen', 'ADJ'), ('appreciation', 'NOUN'), ('stakeholder', 'ADJ'), ('interests', 'NOUN')]

    #Elements of the triple pattern
    ColE1, ColE2, ColE3 = trioCols
    trios = dict([( (data[e][ColE1], data[e+1][ColE2], data[e+2][ColE3]), (data[e], data[e+1], data[e+2])) for e in range(0, len(data)-2)])

    #Triple pattern phrases
    E1, E2, E3 = trioElements
    if trios.has_key((E1, E2, E3)):
        return trios[(E1, E2, E3)]
    else:
        return "Not found"

例子:

trioPattern( (1,0,1), ("NOUN", "of", "PROPN") )

(('Director', 'NOUN'), ('of', 'ADP'), ('Microsoft', 'PROPN'))

trioPattern((0,1,1), ("world's", "VERB", "NOUN"))

(('world’s', 'NUM'), ('leading', 'VERB'), ('company', 'NOUN'))

【讨论】:

    猜你喜欢
    • 2018-04-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-08-18
    • 1970-01-01
    • 2020-01-09
    • 2011-03-19
    相关资源
    最近更新 更多