【问题标题】:pandas comparing two different data frames of length and split certain rows into half熊猫比较两个不同长度的数据帧并将某些行分成两半
【发布时间】:2020-03-05 02:50:05
【问题描述】:

我正在了解 Pandas 的工作原理,并且正在努力操作和比较 Pandas 数据帧。

我有三个数据框只提取了需要的信息;

subjectDF:
   Subject ID              Subject  Year  Teaching Hours PW Facility Requirement
0       Mat13                Maths    13                  5                    N
1      FMat13  Further Mathematics    13                  5                    N
2       Eco13            Economics    13                  5                    N
3       Geo13            Geography    13                  5                    N
4       His13              History    13                  4                    N
5   EngLang13     English Language    13                  4                    N
6    EngLit13   English Literature    13                  4                    N
7       Ger13               German    13                  4                    N
8       Fre13               French    13                  4                    N
9       Spa13              Spanish    13                  4                    N
10      Bus13             Business    13                  4                    N
11     Film13         Film Studies    13                  4                    N
12      Psy13           Psychology    13                  5                    N
13      Lat13                Latin    13                  4                    N
14      Gre13                Greek    13                  4                    N
15      Cla13            Classical    13                  4                    N
16     Phil13           Philosophy    13                  4                    N

studentDF:
      Subject                                         Student ID  Student Number
0       Art13                                          [S8, S19]               2
1       Bio13  [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3...              17
2       Bus13                                    [S10, S30, S47]               3
3       Che13  [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,...              20
4       Cla13                                     [S9, S33, S35]               3
5       Com13  [S2, S3, S10, S14, S16, S19, S31, S45, S192, S...              10
6       Eco13  [S6, S15, S17, S20, S23, S30, S31, S36, S41, S...              13
7   EngLang13                           [S9, S11, S21, S22, S47]               5
8    EngLit13                       [S5, S9, S22, S28, S32, S37]               6
9      FMat13                     [S7, S14, S27, S38, S45, S192]               6
10     Film13                                               [S8]               1
11      Fre13                     [S5, S15, S18, S29, S37, S193]               6
12      Geo13  [S6, S11, S20, S23, S32, S34, S36, S41, S42, S43]              10
13      Ger13                                   [S17, S43, S195]               3
14      Gre13                                         [S33, S40]               2
15      His13            [S5, S11, S21, S22, S32, S35, S37, S41]               8
16      Lat13                                         [S33, S35]               2
17      Mat13  [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S...              34
18     Phil13              [S15, S16, S21, S40, S42, S193, S194]               7
19      Phy13  [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2...              12
20      Psy13                                          [S8, S46]               2
21      Spa13                                    [S18, S36, S47]               3

classroomDF:
  Classroom ID Facility  Capacity
0            C8     None        25
1            C9     None        30
2           C10     None        12
3           C11     None        10
4           C12     None        10
5           C13     None        10
6           C14     None        20
7           C15     None        15
8           C16     None        15
9           C17     None        22
10          C22     None         5
11          C23     None         5

我正在尝试比较subjectDF 中的'Subject ID'studentDF 中的'Subject',如果'Subject' 中的一行未在'Subject ID' 中列出,则删除该行。 例如,由于'Subject' 中的Bio13 未在'Subject ID' 中列出,我希望将Bio13studentDF 中删除。

因此,预期的输出将与 studentDF 完全相同,但没有不在“主题 ID”中的行。

studentDF:
      Subject                                         Student ID  Student Number
0       Art13                                          [S8, S19]               2
1       Bus13                                    [S10, S30, S47]               3

我尝试了很多不同的方法,但大多数时候都出现以下错误;

ValueError: Can only compare identically-labeled Series objects

我不确定我是否应该在这里问另一个问题,我现在会发布,如果有问题,我会在另一个问题中发布。

修改 studentDF 后,我想将 studentDF 中的 'Student Numbers'classroomDF 中的 'Capacity' 进行比较,如果 'Student Number' > 'Capacity',则将学生和科目一分为二。例如,Mat13 有 34 名学生,超过了教室DF 的最大容量。所以我想再次修改studentDF如下; 学生DF:

        Subject                                         Student ID  Student Number
16       ....
17      Mat13_1  [S1, S2, S3, S4, S6, S7, S10, S12, S13, S14, S...              17
18      Mat13_2  [S15, S16, S...                                                17
         ....

任何帮助解决这个问题将不胜感激!

【问题讨论】:

    标签: python pandas dataframe string-comparison


    【解决方案1】:

    IIUC,这就是你要找的东西

    studentDF[~studentDF['Subject'].isin(subjectDF['Subject ID'])]
    

    输出(由于我的 Jupyter 笔记本显示设置,学生 ID 列在此处看起来被截断)

    Subject     Student ID                                          Student Number
    0   Art13   [S8, S19]                                           2
    1   Bio13   [S1, S4, S12, S13, S18, S24, S25, S28, S29, S3...   17
    3   Che13   [S1, S2, S3, S4, S12, S13, S14, S24, S25, S26,...   20
    5   Com13   [S2, S3, S10, S14, S16, S19, S31, S45, S192, S...   10
    19  Phy13   [S1, S7, S26, S27, S38, S44, S48, S49, S50, S2...   12
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多