这里有两个例子,一个使用labelencoder,一个使用onehotencoder。
乐:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder# creating initial dataframe
bridge_types = ('Arch','Beam','Truss','Cantilever','Tied Arch','Suspension','Cable')
bridge_df = pd.DataFrame(bridge_types, columns=['Bridge_Types'])# creating instance of labelencoder
labelencoder = LabelEncoder()# Assigning numerical values and storing in another column
bridge_df['Bridge_Types_Cat'] = labelencoder.fit_transform(bridge_df['Bridge_Types'])
bridge_df
结果:
Bridge_Types Bridge_Types_Cat
0 Arch 0
1 Beam 1
2 Truss 6
3 Cantilever 3
4 Tied Arch 5
5 Suspension 4
6 Cable 2
哦:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder# creating instance of one-hot-encoder
enc = OneHotEncoder(handle_unknown='ignore')# passing bridge-types-cat column (label encoded values of bridge_types)
enc_df = pd.DataFrame(enc.fit_transform(bridge_df[['Bridge_Types_Cat']]).toarray())# merge with main df bridge_df on key values
bridge_df = bridge_df.join(enc_df)
bridge_df
结果:
Bridge_Types Bridge_Types_Cat 0 1 2 3 4 5 6
0 Arch 0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
1 Beam 1 0.0 1.0 0.0 0.0 0.0 0.0 0.0
2 Truss 6 0.0 0.0 0.0 0.0 0.0 0.0 1.0
3 Cantilever 3 0.0 0.0 0.0 1.0 0.0 0.0 0.0
4 Tied Arch 5 0.0 0.0 0.0 0.0 0.0 1.0 0.0
5 Suspension 4 0.0 0.0 0.0 0.0 1.0 0.0 0.0
6 Cable 2 0.0 0.0 1.0 0.0 0.0 0.0 0.0
注意:请确保您只对作为实际标签的列执行此操作,而不是对那些...比如说 ID 或日期,或类似这些行的任何内容。