这里有多个脚本问题:
- 使用
df['Price'] = df['Price'].apply(lambda x: '{:.2f}'.format(x)) 可以将 Price 列转换为字符串,因此任何数字操作在此 dtype 上均无效。如果要设置显示的浮点精度,可以在 import pandas 行后使用pd.set_option('display.float_format', lambda x: '%.2f' % x)。
-
如果我理解得很好,您希望按比例缩放价格值
客户范围为 0-1 值。如果是这种情况,你不能像这样在一行中做到这一点:
df.groupby('客户').Price.transform(lambda
x:scaler.fit_transform(s.values.reshape(-1,1)).ravel())
因为每组对缩放器对象的每次迭代都会覆盖前一组的最小/最大参数。所以最后当你调用 scaler.inverse_transform() 时你有最后一组的参数。
解决方案:
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.7f' % x)
from sklearn.preprocessing import MinMaxScaler
scalers_dict = dict({}) # Prepare place to hold scalers per Customer
df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
df['Price_scaled'] = None
df['Price_inverse_scaled'] = None
# Loop over Customers, scale Price and save scaler for further use
for customer in pd.unique(df['Customer']):
scalers_dict[customer] = MinMaxScaler()
scaled_price = scalers_dict[customer].fit_transform(df[df['Customer']==customer].Price.values.reshape(-1, 1))
df.loc[df['Customer']==customer, 'Price_scaled'] = scaled_price
# Loop over Customers and inverse scaled values
for customer in pd.unique(df['Customer']):
inverse_scale = scalers_dict[customer].inverse_transform(df[df['Customer']==customer].Price_scaled.values.reshape(-1, 1))
df.loc[df['Customer']==customer, 'Price_inverse_scaled'] = inverse_scale
输出(我将精度设置为 .7f,因为 Max 值对于许多值来说太大了,其中大多数在双精度下变成了 0.00):
Date Customer Price Price_scaled Price_inverse_scaled
0 2019-01-06 A 142404534.1300000 1.0000000 142404534.1300000
1 2019-01-07 A 989.3400000 0.0000065 989.3400000
2 2019-01-08 A 45444.5700000 0.0003187 45444.5700000
3 2019-01-09 A 574343.1000000 0.0040327 574343.1000000
4 2019-01-10 A 23232.3400000 0.0001627 23232.3400000
5 2019-01-01 A 923423.0000000 0.0064840 923423.0000000
6 2019-01-02 A 332.0000000 0.0000019 332.0000000
7 2019-01-03 A 2342323.2400000 0.0164479 2342323.2400000
8 2019-01-04 A 232.0000000 0.0000012 232.0000000
9 2019-01-05 A 65.7000000 0.0000000 65.7000000
10 2019-01-06 B 875.4600000 0.0000000 875.4600000
11 2019-01-07 B 142466027340.0299988 1.0000000 142466027340.0299988
12 2019-01-08 B 25.1700000 0.0000000 25.1700000
13 2019-01-09 B 1.0100000 0.0000000 1.0100000
14 2019-01-10 B 1.0000000 0.0000000 1.0000000
15 2019-01-10 B 57.6100000 0.0000000 57.6100000
16 2019-01-06 B 232232.7800000 0.0000016 232232.7800000
17 2019-01-07 B 15.2000000 0.0000000 15.2000000
18 2019-01-08 B 44.5600000 0.0000000 44.5600000
19 2019-01-09 B 2323254.4500000 0.0000163 2323254.4500000
20 2019-01-10 B 395.4500000 0.0000000 395.4500000
21 2019-01-10 B 23423454.9200000 0.0001644 23423454.9200000
22 2019-01-06 C 34.1200000 0.0000000 34.1200000
23 2019-01-07 C 89.3400000 0.1787171 89.3400000
24 2019-01-08 C 44.5700000 0.0338210 44.5700000
25 2019-01-09 C 343.1000000 1.0000000 343.1000000
26 2019-01-10 C 232.3400000 0.6415302 232.3400000
编辑:
多列版本
import pandas as pd
pd.set_option('display.float_format', lambda x: '%.7f' % x)
from sklearn.preprocessing import MinMaxScaler
scalers_dict = dict({}) # Prepare place to hold scalers per Customer
df = pd.read_csv('stack_data.csv', parse_dates=['Date'])
df['Price2'] = range(df.shape[0])
df['Price3'] = df['Price2'] * 2
df = df.join(pd.DataFrame(0, df.index, ['Price1_scaled', 'Price2_scaled', 'Price3_scaled', 'Price1_inverse_scaled', 'Price2_inverse_scaled', 'Price3_inverse_scaled']))
# Loop over Customers, scale Price and save scaler for further use
for customer in pd.unique(df['Customer']):
scalers_dict[customer] = MinMaxScaler()
scaled_price = scalers_dict[customer].fit_transform(df.loc[df['Customer']==customer, 'Price':'Price3'])
df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'] = scaled_price
# Loop over Customers and inverse scaled values
for customer in pd.unique(df['Customer']):
inverse_scale = scalers_dict[customer].inverse_transform(df.loc[df['Customer']==customer, 'Price1_scaled':'Price3_scaled'])
df.loc[df['Customer']==customer, 'Price1_inverse_scaled':'Price3_inverse_scaled'] = inverse_scale
输出:
Date Customer Price Price2 Price3 Price1_scaled Price2_scaled Price3_scaled Price1_inverse_scaled Price2_inverse_scaled Price3_inverse_scaled
0 2019-01-06 A 142404534.1300000 0 0 1.0000000 0.0000000 0.0000000 142404534.1300000 0.0000000 0.0000000
1 2019-01-07 A 989.3400000 1 2 0.0000065 0.1111111 0.1111111 989.3400000 1.0000000 2.0000000
2 2019-01-08 A 45444.5700000 2 4 0.0003187 0.2222222 0.2222222 45444.5700000 2.0000000 4.0000000
3 2019-01-09 A 574343.1000000 3 6 0.0040327 0.3333333 0.3333333 574343.1000000 3.0000000 6.0000000
4 2019-01-10 A 23232.3400000 4 8 0.0001627 0.4444444 0.4444444 23232.3400000 4.0000000 8.0000000
5 2019-01-01 A 923423.0000000 5 10 0.0064840 0.5555556 0.5555556 923423.0000000 5.0000000 10.0000000
6 2019-01-02 A 332.0000000 6 12 0.0000019 0.6666667 0.6666667 332.0000000 6.0000000 12.0000000
7 2019-01-03 A 2342323.2400000 7 14 0.0164479 0.7777778 0.7777778 2342323.2400000 7.0000000 14.0000000
8 2019-01-04 A 232.0000000 8 16 0.0000012 0.8888889 0.8888889 232.0000000 8.0000000 16.0000000
9 2019-01-05 A 65.7000000 9 18 0.0000000 1.0000000 1.0000000 65.7000000 9.0000000 18.0000000
10 2019-01-06 B 875.4600000 10 20 0.0000000 0.0000000 0.0000000 875.4600000 10.0000000 20.0000000
11 2019-01-07 B 142466027340.0299988 11 22 1.0000000 0.0909091 0.0909091 142466027340.0299988 11.0000000 22.0000000
12 2019-01-08 B 25.1700000 12 24 0.0000000 0.1818182 0.1818182 25.1700000 12.0000000 24.0000000
13 2019-01-09 B 1.0100000 13 26 0.0000000 0.2727273 0.2727273 1.0100000 13.0000000 26.0000000
14 2019-01-10 B 1.0000000 14 28 0.0000000 0.3636364 0.3636364 1.0000000 14.0000000 28.0000000
15 2019-01-10 B 57.6100000 15 30 0.0000000 0.4545455 0.4545455 57.6100000 15.0000000 30.0000000
16 2019-01-06 B 232232.7800000 16 32 0.0000016 0.5454545 0.5454545 232232.7800000 16.0000000 32.0000000
17 2019-01-07 B 15.2000000 17 34 0.0000000 0.6363636 0.6363636 15.2000000 17.0000000 34.0000000
18 2019-01-08 B 44.5600000 18 36 0.0000000 0.7272727 0.7272727 44.5600000 18.0000000 36.0000000
19 2019-01-09 B 2323254.4500000 19 38 0.0000163 0.8181818 0.8181818 2323254.4500000 19.0000000 38.0000000
20 2019-01-10 B 395.4500000 20 40 0.0000000 0.9090909 0.9090909 395.4500000 20.0000000 40.0000000
21 2019-01-10 B 23423454.9200000 21 42 0.0001644 1.0000000 1.0000000 23423454.9200000 21.0000000 42.0000000
22 2019-01-06 C 34.1200000 22 44 0.0000000 0.0000000 0.0000000 34.1200000 22.0000000 44.0000000
23 2019-01-07 C 89.3400000 23 46 0.1787171 0.2500000 0.2500000 89.3400000 23.0000000 46.0000000
24 2019-01-08 C 44.5700000 24 48 0.0338210 0.5000000 0.5000000 44.5700000 24.0000000 48.0000000
25 2019-01-09 C 343.1000000 25 50 1.0000000 0.7500000 0.7500000 343.1000000 25.0000000 50.0000000
26 2019-01-10 C 232.3400000 26 52 0.6415302 1.0000000 1.0000000 232.3400000 26.0000000 52.0000000