【发布时间】:2022-07-06 18:22:10
【问题描述】:
我正在尝试从另一列的子字符串在 pandas 中创建新列。
import pandas as pd
import re
df = {'title':['Apartment 2 roomns, 40 m²', 'House 7 rooms, 183 m²', 'House 4 rooms, 93 m²', 'Apartment 12 rooms, 275 m²']}
我正在尝试使用正则表达式来捕获组:
df['Name'] = df.title.str.extract(r'(^[a-zA-Z]+)', expand=True)
这一次我取得了不错的成绩。但我需要一个包含房间数量的列(没有“房间”这个词)和另一个没有“m²”大小的列。我试过了:
df['Rooms'] = df.title.str.replace(r'(^[0-9]+)\s(rooms)', r'\1') #to capture only the first group, which is the number
df['Size'] = df.title.str.replace(r'(^[0-9]+)\s(m²)', r'\1') #to capture only the first group, which is the number
我的输出:
Name Rooms Size
0 Apartment Apartment 2 roomns, 40 m² Apartment 2 roomns, 40 m²
1 House House 7 rooms, 183 m² House 7 rooms, 183 m²
2 House House 4 rooms, 93 m² House 4 rooms, 93 m²
3 Apartment Apartment 12 rooms, 275 m² Apartment 12 rooms, 275 m²
良好的输出:
Name Rooms Size
0 Apartment 2 40
1 House 7 183
2 House 4 93
3 Apartment 12 275
【问题讨论】:
-
我在
roomns这个词中看到了一个错字,所以我想你需要考虑这个错字。