在 Python 中使用 split 函数答案

【问题标题】：Using the split function in Python在 Python 中使用 split 函数
【发布时间】：2023-12-01 10:59:02
【问题描述】：

我正在使用 CSV 模块，我正在编写一个简单的程序，该程序采用文件中列出的几位作者的姓名，并以这种方式对其进行格式化：john.doe

到目前为止，我已经达到了我想要的结果，但是我在获取代码以排除诸如“先生”之类的标题时遇到了麻烦。我一直在考虑使用拆分功能，但我不确定这是否有用。

有什么建议吗？提前致谢！

到目前为止，这是我的代码：

import csv


books = csv.reader(open("books.csv","rU"))


for row in books:


     print '.'.join ([item.lower() for item in [row[index] for index in (1, 0)]])

【问题讨论】：

看看filter()函数：docs.python.org/library/functions.html#filter
如果你能想出一种方法来使用 split() 做你想做的事，那么它就是一个很好的使用方法。如果您向我们展示您的代码并准确说明您的要求，那么回答这个问题会更容易。
您能否具体说明一下您拥有什么以及您想要什么？（欢迎举几个例子）
row[index] for index in (1, 0)可以写成：row[1::-1]

标签： python csv

【解决方案1】：

这取决于字符串的混乱程度，在最坏的情况下，这个基于正则表达式的解决方案应该可以完成这项工作：

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
x.sub("", text)

（我在这里使用re.compile()，因为由于某些原因，Python 2.6 re.sub 不接受flags= kwarg..）

更新：我编写了一些代码来测试它，虽然我无法找到自动化结果检查的方法，但看起来效果很好。这是测试代码：

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
names = ["".join([a,b,c,d]) for a in ['', ' ', '   ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]
print "\n".join([" => ".join((n,x.sub('',n))) for n in names])

【讨论】：

其实测试代码是单行的..:print "\n".join([" => ".join((n,re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE).sub('',n))) for n in ["".join([a,b,c,d]) for a in ['', ' ', ' ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]])

【解决方案2】：

根据您的数据的复杂性和您的需求范围，您可能能够摆脱一些简单的事情，例如在迭代它们时使用 replace() 从 csv 中的行中剥离标题。

类似的东西：

titles = ["Mr.", "Mrs.", "Ms", "Dr"] #and so on

for line in lines:
    line_data = line
    for title in titles:
        line_data = line_data.replace(title,"")
    #your code for processing the line

这可能不是最有效的方法，但根据您的需要可能比较合适。

这如何与您发布的代码一起使用（我猜 Mr./Mrs. 是第 1 列的一部分，名字）：

import csv

books = csv.reader(open("books.csv","rU"))

for row in books:
     first_name = row[1]
     last_name = row[0]
     for title in titles:
          first_name = first_name.replace(title,"")
     print '.'.(first_name, last_name)

【讨论】：