【问题标题】:Separate first, middle and last names (Python)分隔名字、中间名和姓氏 (Python)
【发布时间】:2011-01-13 08:42:57
【问题描述】:

我有一个包含数百个成员的列表,我想用名字、中间名和姓氏来分隔,但有些成员有前缀(用“P”表示)。所有可能的组合:

First Middle Last
P First Middle Last
First P Middle Last
P First p Middle Last

如何在 Python 中分隔 First(带 P,如果可用)、Middle(带 P,如果可用)和 Last names?这是我想出的,但它并不完全奏效。

import csv
inPath = "input.txt"
outPath = "output.txt"

newlist = []

file = open(inPath, 'rU')
if file:
    for line in file:
        member = line.split()
        newlist.append(member)
    file.close()
else:
    print "Error Opening File."

file = open(outPath, 'wb')
if file:
    for i in range(len(newlist)):
        print i, newlist[i][0] # Should get the First Name with Prefix
        print i, newlist[i][1] # Should get the Middle Name with Prefix
        print i, newlist[i][-1]
    file.close()
else:
    print "Error Opening File."

我想要的是:

  1. 获取名字和中间名及其前缀(如果有)
  2. 将每个(第一个、中间、最后一个)输出到单独的 txt 文件或单个 CSV 文件(最好)。

非常感谢您的帮助。

【问题讨论】:

  • 从示例中不清楚“前缀”是什么;例如,如何判断“A B C D”是("A B", "C", "D") 还是("A", "B C", "D")。请举一个更完整的例子,并更具体地解释什么是“前缀”。
  • 如果前缀是一个字母长并且没有一个字母长的名称,您可以尝试len() 并将它们过滤掉,将它们与它们各自的名称分组。只是一个想法。
  • 只有三个前缀“M”、“Shk”和“BS”

标签: python


【解决方案1】:

这个完整的测试脚本怎么样:

import sys

def process(file):
    for line in file:
        arr = line.split()
        if not arr:
            continue
        last = arr.pop()
        n = len(arr)
        if n == 4:
            first, middle = ' '.join(arr[:2]), ' '.join(arr[2:])
        elif n == 3:
            if arr[0] in ('M', 'Shk', 'BS'):
                first, middle = ' '.join(arr[:2]), arr[-1]
            else:
                first, middle = arr[0], ' '.join(arr[1:])
        elif n == 2:
            first, middle = arr
        else:
            continue
        print 'First: %r' % first
        print 'Middle: %r' % middle
        print 'Last: %r' % last

if __name__ == '__main__':
    process(sys.stdin)

如果您在 Linux 上运行此程序,请输入示例行,然后按 Ctrl+D 表示输入结束。在 Windows 上,使用 Ctrl+Z 而不是 Ctrl+D。当然,您也可以通过管道输入文件。

以下输入文件:

First Middle Last
M First Middle Last
First Shk Middle Last
BS First M Middle Last

给出这个输出:

First: 'First'
Middle: 'Middle'
Last: 'Last'
First: 'M First'
Middle: 'Middle'
Last: 'Last'
First: 'First'
Middle: 'Shk Middle'
Last: 'Last'
First: 'BS First'
Middle: 'M Middle'
Last: 'Last'

【讨论】:

  • 太棒了!奇迹般有效! :D
【解决方案2】:

在这里,以面向对象的方式:

class Name(object):
    def __init__(self, fullname):
        self.full = fullname
        s = self.full.split()

        try:
            self.first = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
            s = s[len(self.first.split()):]

            self.middle = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
            s = s[len(self.middle.split()):]

            self.last = " ".join(s[:2]) if len(s[0]) == 1 else s[0]
        finally:
            pass

names = [
    "First Middle Last",
    "P First Middle Last",
    "First P Middle Last",
    "P First p Middle Last",
]

for fullname in names:
    name = Name(fullname)
    print (name.first, name.middle, name.last)

【讨论】:

  • 这里有什么需要上课的?顺便说一句,前缀不仅仅是一个字符的字符串,尽管在问题中并没有说得很清楚..
  • 上课?好吧,也许是美观、可读性、代码重用。这里有一个来自 Python 的 Zen:命名空间是一个很棒的想法——让我们做更多的事情! ;) 前缀,以及 if 语句中的表达式总是可以调整的,是的,在 Q 中并不清楚。
  • 我的意思是,当你只需要一个函数时,为什么要使用一个类?
【解决方案3】:

如果“M”、“Shk”和“BS”不是有效的姓名/姓氏,即您不关心他们的确切位置,您可以使用单行将它们过滤掉:

first, middle, last = filter(lambda x: x not in ('M','Shk','BS'), yourNameHere.split())

当然,yourNameHere 是包含您要解析的名称的字符串。

警告:对于这段代码,我假设您总是有一个中间名,正如您在上面的示例中指定的那样。如果没有,您必须获取整个列表并计数元素才能知道您是否有中间名。

编辑:如果您确实关心前缀位置:

first, middle, last = map(
    lambda x: x[1],
    filter(
        lambda (i,x): i not in (0, 2) or x not in ('M','Shk','BS'),
        enumerate(yourNameHere.split())))

【讨论】:

  • 或者 [x[1] for x in filter( ... )] 我不确定哪个性能更好,但第二种方法是避免创建函数..
【解决方案4】:
names = [('A', 'John', 'Paul', 'Smith'),
('Matthew', 'M', 'Phil', 'Bond'),
('A', 'Morris', 'O', 'Reil', 'M', 'Big')]

def getItem():
    for name in names:
        for (pos,item) in enumerate(name):
            yield item

itembase = getItem()

for i in enumerate(names):
    element = itembase.next()
    if len(element) == 1: firstName = element+" "+itembase.next()
    else: firstName = element
    element = itembase.next()
    if len(element) == 1: mName = element+" "+itembase.next()
    else: mName = element
    element = itembase.next()
    if len(element) == 1: lastName = element+" "+itembase.next()
    else: lastName = element

    print "First Name: "+firstName
    print "Middle Name: "+mName
    print "Last Name: "+lastName
    print "--"

这似乎有效。将 len(element) == 1 条件(我不知道您只需要检查 3 个,所以我用任何单个字母完成了一个)替换为查找您拥有的三个前缀的条件。

**Output**
First Name: A John
Middle Name: Paul
Last Name: Smith

First Name: Matthew
Middle Name: M Phil
Last Name: Bond

First Name: A Morris
Middle Name: O Reil
Last Name: M Big

【讨论】:

  • 似乎对此不起作用:Firts Middle Last | M First Middle Last | First Shk Middle Last | Shk First M Middle Last
  • 我已经声明您必须将len(element) == 1 替换为您需要的条件。我不能为你做所有的工作,这只是一个例子。其他人提供了更好的,我们都在这里学习。
【解决方案5】:
import csv

class CsvWriter(object):
    """
    Wraps csv.writer in a partial file-API compatibility layer
    """
    def __init__(self, fname, mode='w', *args, **kwargs):
        super(CsvWriter, self).__init__()
        self.f = open(fname, mode)
        self.writer = csv.writer(self.f, *args, **kwargs)

    def write(self, *args):
        """
        Writes a row of data to the csv file

        Can be called as
          .write()         puts a blank row
          .write(2)        puts a single cell
          .write([1,2,3])  puts 3 cells
          .write(1,2,3)    puts 3 cells
        """
        if len(args)==1 and hasattr(args[0], ('__iter__')):
            # single argument, and it's a sequence - let it be the row data
            rowdata = args[0]
        else:
            rowdata = args

        self.writer.writerow(rowdata)

    def close(self):
        self.writer = None
        self.f.close()

    def __enter__(self):
        return self

    def __exit__(self, *exc):
        self.close()

class NameSplitter(object):
    def __init__(self, pre=None):
        super(NameSplitter, self).__init__()

        # list of accepted prefixes
        if pre is None:
            self.pre = set(['m','shk','bs'])
        else:
            self.pre = set([s.lower() for s in pre])

        # is-a-prefix word tester
        self.isPre = lambda x,p=self.pre: x.lower() in p

        jn = lambda *args: ' '.join(*args)

        # signature-based dispatch table
        self.match = {}
        self.match[(3,())]    = lambda w,j=jn: (w[0],         w[1],         w[2])
        self.match[(4,(0,))]  = lambda w,j=jn: (j(w[0],w[1]), w[2],         w[3])
        self.match[(4,(1,))]  = lambda w,j=jn: (w[0],         j(w[1],w[2]), w[3])
        self.match[(5,(0,2))] = lambda w,j=jn: (j(w[0],w[1]), j(w[2],w[3]), w[4])

    def __call__(self, nameStr):
        words = nameStr.split()

        # build hashable signature
        pres  = tuple(n for n,word in enumerate(words) if self.isPre(word))
        sig   = (len(words), pres)

        try:
            do = self.match[sig]
            return do(words)
        except KeyError:
            return None

def process(inf, outf, fn):
    for line in inf:
        res = fn(line)
        if res is not None:
            outf.write(res)

def main():
    infname = "input.txt"
    outfname = "output.csv"

    with open(infname,'rU') as inf:
        with CsvWriter(outfname) as outf:
            process(inf, outf, NameSplitter())

if __name__=="__main__":
    main()

【讨论】:

    【解决方案6】:

    完整的脚本:

    import sys
    
    def f(a,b):
        if b in ('M','Shk','BS'):
                return '%s %s' % (b,a)
        else:
                return '%s,%s' % (b,a)
    
    for line in sys.stdin:
        sys.stdout.write(reduce(f, reversed(line.split(' '))))
    

    输入:

    First Middle Last
    M First Middle Last
    First Shk Middle Last
    BS First M Middle Last
    

    CSV 输出:

    First,Middle,Last
    M First,Middle,Last
    First,Shk Middle,Last
    BS First,M Middle,Last
    

    【讨论】:

      【解决方案7】:

      这是另一个解决方案(通过更改相关源代码获得):

      import csv
      inPath = "input.txt"
      outPath = "output.txt"
      
      newlist = []
      
      file = open(inPath, 'rU')
      if file:
          for line in file:
              member = line.split()
              newlist.append(member)
          file.close()
      else:
          print "Error Opening File."
      
      file = open(outPath, 'wb')
      if file:
          for fullName in newlist:
              prefix = ""
              for name in fullName:
                  if name == "P" or name == "p":
                      prefix = name + " "
                      continue
                  print prefix+name
                  prefix = ""
              print
          file.close()
      else:
          print "Error Opening File."
      

      【讨论】:

      • 知道给这篇文章投“无用”票的原因对我很有帮助,这样我以后就可以避免这样的帖子了。我尝试对问题中给出的源代码进行最少的修改并提供答案。
      【解决方案8】:

      我会使用正则表达式,它是专门为此目的而设计的。 该解决方案易于维护和理解。

      值得一试。 http://docs.python.org/library/re.html

      import re
      from operator import truth
      
      // patterns
                           //First    Middle   Last
      first = re.compile ("^([\w]+) +([\w]+) ([\w]+)$")
                           //P  First    Middle    Last
      second = re.compile ("^(M|Shk|BS) +([\w]+) +([\w]+) ([\w]+)$") 
                          //First     P    Middle   Last
      third = re.compile ("^([\w]+) +(M|Shk|BS) +([\w]+) ([\w]+)$")     
                           //P    First    p   Middle    Last
      forth = re.compile ("^(M|Shk|BS) +([\w]+) +(M|Shk|BS) +([\w]+) ([\w]+)$")     
      
      if truth (first.search (you_string)):
           parsed = first.search (you_string)
           print parsed.group(1), parsed.group(2), parsed.group(3)
      elif truth (second.search (you_string)):
           parsed = first.search (you_string)
           print parsed.group(1), parsed.group(2), parsed.group(3)
      elif truth (third.search (you_string)):
           parsed = first.search (you_string)
           print parsed.group(1), parsed.group(2), parsed.group(3)
      elif truth (forth.search (you_string)):
           parsed = first.search (you_string)
           print parsed.group(1), parsed.group(2), parsed.group(3)
      else:
           print "not match at all"
      

      由于预编译模式,它会执行得更快

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2013-10-22
        • 1970-01-01
        • 2013-03-26
        • 2016-02-22
        • 2020-05-08
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多