【发布时间】:2021-08-31 18:28:42
【问题描述】:
我有输入:
["3 years 8 months", "10 months", "1 year 10 months", "9 months", " 1 month ", "1 year", "3 years"]
我想要这个输出:
[3.8, 0.10, 1.10, 0.09, 0.01, 1, 3]
【问题讨论】:
标签: python pandas replace find re
我有输入:
["3 years 8 months", "10 months", "1 year 10 months", "9 months", " 1 month ", "1 year", "3 years"]
我想要这个输出:
[3.8, 0.10, 1.10, 0.09, 0.01, 1, 3]
【问题讨论】:
标签: python pandas replace find re
你可以使用str.split:
def to_num(s):
c = {'year':1, 'years':1, 'month':0.01, 'months':0.01}
return sum(int(s[i])*c[s[i+1]] for i in range(0, len(s), 2))
vals = ["3 years 8 months", "10 months", "1 year 10 months", "9 months", "1 month", "1 year", "3 years"]
result = [to_num(i.split()) for i in vals]
输出:
[3.08, 0.1, 1.1, 0.09, 0.01, 1, 3]
【讨论】:
方法一:使用re.search
import re
def date_to_number(x):
year, month = 0, 0
if re.search(r'(\d*)\syear', x):
year = float(re.search(r'(\d*)\syear', x).group(1))
if re.search(r'(\d*)\smonth', x):
month = float(re.search(r'(\d*)\smonth', x).group(1))/100
return year+month
numbers = [date_to_number(i) for i in data]
输出:
print(numbers)
[3.08, 0.1, 1.1, 0.09, 0.01, 1.0, 3.0]
方法二:在 Pandas 中使用extract()
如果您的数据存储在数据框中,您可以试试这个:
df = pd.DataFrame(data, columns=['date'])
df['date_to_number'] = (df['date'].str.extract(r'(\d*)\syear').fillna(0).astype('int')
+ df['date'].str.extract(r'(\d*)\smonth').fillna(0).astype('int').divide(100))
输出: 打印(df)
date date_to_number
0 3 years 8 months 3.08
1 10 months 0.10
2 1 year 10 months 1.10
3 9 months 0.09
4 1 month 0.01
5 1 year 1.00
6 3 years 3.00
【讨论】: