【发布时间】:2018-03-15 12:22:46
【问题描述】:
我想在终端中明确指定我的训练集和测试集。而不是在终端中运行 .ipynb 文件时在代码中指定它们。 到目前为止,这就是我正在做的事情。
# FOR TRAINING DATA
# LISTING OUT ALL FILES PRESENT IN FOLDER PATH
path = "C:/Users/****/****/Latest_Datasets/base_out"
files = os.listdir(path)
df = pd.DataFrame()
# APPENDING THE ALL DATA FROM THE FOLDER PATH TO DATAFRAME
for f in files:
data = pd.read_csv(f, 'Sheet1',delimiter='\t',usecols=['details','amount','category'],encoding=("utf-8"))
df = df.append(data)
df.reset_index(level=0, inplace=True)
df['index1'] = df.index
df=df[['index1','amount','details','category']]
# FOR TEST DATA
test_data=pd.read_csv('testfile.csv',
delimiter='\t',usecols=['xn_details','xn_amount','category'],encoding='utf-8')
x_train, y_train = (df.details, df.category )
x_test, y_test = (test_data.details, test_data.category)
# After this I apply my model and get my classifications for my test.details
我想在终端中将训练数据和测试数据作为参数而不是在脚本中指定。 我该怎么做呢。 提前致谢
【问题讨论】:
-
在命令行中给出文件名。使用
sys.argv
标签: python machine-learning terminal ipython jupyter-notebook