拆分数据框（csv）答案

【问题标题】：Splitting a dataframe (csv)拆分数据框（csv）
【发布时间】：2019-06-30 21:39:00
【问题描述】：

如何以 4:1 的比例随机拆分数据帧 (csv) 并将它们存储在两个不同的变量中 ex-如果数据帧中有 10 行从 1 到 10，我希望变量“a”中有 8 行，变量“b”中剩余 2 行。

【问题讨论】：

通常，在没有询问者尝试过的代码示例的情况下提出的问题将作为“离题”关闭（因此不要得到回答）。下次，请包含您的代码。

标签： python-3.x

【解决方案1】：

我从来没有随机这样做过，但基本方法是：

导入熊猫 2)
读入你的 csv
删除空/空列（避免这些问题）
创建一个新的数据框以将拆分值放入
为新列指定名称
拆分值并组合值（使用 apply/combine/lambda）

代码示例：

# importing pandas module 
import pandas as pd 

# read in csv file 
data = pd.read_csv("https://mydata.csv") 

# drop null values 
data.dropna(inplace = True) 

#  create new data frame 
new = data["ColumnName"].str.split(" ", n = 1, expand = True) #this 'split' code applies to splitting one column into two

# assign new name to first column
data["A"]= new[0] #8 concatenated values will go here

# making seperate last name column from new data frame 
data["B"]= new[1]  #last two [combined] values in go here

## other/different code required for concatenation of column values - 看看这个链接的 SO 问题##

# df display 
data

希望对你有帮助

【讨论】：

在机器学习中，from sklearn.model_selection import train_test_split xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size = 0.2, random_state = 0)，这是有监督学习算法时的代码用来。无人监督时的代码是什么。在这种情况下基本上没有'y'。希望你能得到我的问题
@RahulRamaswamy 我明白你的问题。我认为您可以通过集成 a) 链接问题中的代码和 b) 您现有的知识和逻辑来调整代码。只是想帮忙！
老实说，我无法在这里提出我的确切问题。