Pandas 不同的采样大小答案

【问题标题】：Pandas Different Sampling SizePandas 不同的采样大小
【发布时间】：2021-05-03 01:34:59
【问题描述】：

有人可以帮我处理熊猫吗？

我有一组数据 n=50，如果我需要根据不同的大小随机选择数据，我该如何从数据中选择随机样本。

例如：

a = 从 50 开始，我需要选择 25
b = 从 50 中，我需要选择 5
c = Fom 50，我需要选择 10
d = 从 50 开始，我需要选择 2
e = 从 50 开始，我需要选择 8

我用过

a = df.sample(25)

如果我创建 b=df.sample(5)，我如何确定它不是与 a、c、d、e 相同的样本？

【问题讨论】：

标签： python pandas numpy random sampling

【解决方案1】：

尝试使用sample 改组数据帧，然后使用iloc 拉取切片：

import pandas as pd
import numpy as np

# Create DataFrame With Dummy Data
src_df = pd.DataFrame(np.linspace(150, 500, 50), columns=['Value'])

# Randomly Shuffle Data
shuffled = src_df.sample(frac=1)

# Number of Rows Per Group
rows_to_grab = [25, 5, 10, 2, 8]

# Make Sure Row Groups Add up to Total Number of Rows
assert sum(rows_to_grab) == len(shuffled)

dfs = []
start_index = 0
for rows in rows_to_grab:
    # Append Shuffled Rows to dfs
    dfs.append(shuffled.iloc[start_index: start_index + rows])
    start_index += rows

# Print Out
for header, df in zip("abcde", dfs):
    print(f'       {header}       ')
    print(df)

也可以手动分配组：

a = shuffled.iloc[:25]
b = shuffled.iloc[25:30]
c = shuffled.iloc[30:40]
d = shuffled.iloc[40:42]
e = shuffled.iloc[42:50]

e 的可能输出示例：

       e       
         Value
13  242.857143
48  492.857143
28  350.000000
25  328.571429
39  428.571429
17  271.428571
41  442.857143
18  278.571429

【讨论】：