【问题标题】:String manipulations using Python Pandas使用 Python Pandas 进行字符串操作
【发布时间】:2015-05-31 07:11:10
【问题描述】:

我有一些姓名和种族数据,例如:

John Wick    English
Black Widow  French

然后我做了一些操作,使名称如下所示

John Wick  -> john#wick??????????????????????????????????
Black Widow -> black#widow????????????????????????????????

然后我继续创建多个变量,每个变量都通过 for 循环包含 3 个字符的子字符串。

我还尝试使用 re.findall 查找字母的数量。

我有两个问题: 1) for 循环有效吗?即使它按原样工作,我可以用更好的代码替换吗? 2)我无法获得试图找到工作字母数量的代码。有什么建议吗?

import pandas as pd
from pandas import DataFrame
import re

# Get csv file into data frame
data = pd.read_csv("C:\Users\KubiK\Desktop\OddNames_sampleData.csv")
frame = DataFrame(data)
frame.columns = ["name", "ethnicity"]
name = frame.name
ethnicity = frame.ethnicity

# Remove missing ethnicity data cases
index_missEthnic = frame.ethnicity.isnull()
index_missName = frame.name.isnull()
frame2 = frame.loc[~index_missEthnic, :]
frame3 = frame2.loc[~index_missName, :]

# Make all letters into lowercase
frame3.loc[:, "name"] = frame3["name"].str.lower()
frame3.loc[:, "ethnicity"] = frame3["ethnicity"].str.lower()

# Remove all non-alphabetical characters in Name
frame3.loc[:, "name"] = frame3["name"].str.replace(r'[^a-zA-Z\s\-]', '') # Retain space and hyphen

# Replace empty space as "#"
frame3.loc[:, "name"] = frame3["name"].str.replace('[\s]', '#')

# Find the longest name in the dataset
##frame3["name_length"] = frame3["name"].str.len()
##nameLength = frame3.name_length
##print nameLength.max() # Longest name has !!!40 characters!!! including spaces and hyphens

# Add "?" to fill spaces up to 43 characters
frame3["name_filled"] = frame3["name"].str.pad(side="right", width=43, fillchar="?")

# Split into three-character strings
for i in range(1, 41):
    substr = "substr" + str(i)
    frame3[substr] = frame3["name_filled"].str[i-1:i+2]

# Count number of characters
frame3["name_len"] = len(re.findall('[a-zA-Z]', name))

# Test outputs
print frame3

【问题讨论】:

    标签: string python-2.7 pandas


    【解决方案1】:

    !)关于循环,我想不出比你已经在做的更好的方法了

    2) 试试frame3["name_len"] = frame3["name"].map(lambda x : len(re.findall('[a-zA-Z]', x)))

    【讨论】:

    • @KubiK888 不错,熟悉pandas的map()和apply()很强大
    猜你喜欢
    • 2020-04-15
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-12-18
    • 1970-01-01
    • 2012-08-11
    • 1970-01-01
    相关资源
    最近更新 更多