为列中的每个变量绘制一个直方图（单独）答案

【问题标题】：Plot one histogram(separate) for each variable in the column为列中的每个变量绘制一个直方图（单独）
【发布时间】：2017-07-08 16:31:05
【问题描述】：

我想为列中的每个变量绘制一个直方图（单独）。数据是使用 CSV 文件 (sample.csv) 导入的，看起来像

ip_addr_player_id,  event_name, level, points_earned, stars_earned, moves
118.93.180.241, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,   2 
118.93.180.241, Puzzle Complete, Botany Lab Puzzle 2, 1000, 2,   2 
118.93.180.241, Puzzle Complete, Botany Lab Puzzle 3, 1000, 2,   2 
203.166.252.219, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,  2     
54.166.252.324, Puzzle Complete, Botany Lab Puzzle 5, 1000, 2,  2

鉴于每个ip_addr_player_id 都是唯一的，我想为points_earned、starts_earned 和moves 绘制直方图（对于每个ip_addr_payer_id）。

我根据一个可以在网上找到的示例进行了尝试；

 library(readr)
 dataIn <- read.csv("sample.csv")
 #View(dataIn)
 library(ggplot2)
 plot <- ggplot(dataIn, aes(level, points_earned, fill=points_earned))+ 
              geom_histogram() + facet_wrap(~ip_addr_player_id)
 plot

但是这段代码没有给我任何输出。

【问题讨论】：

供您参考，您可以阅读how to make a reproducible example in r 以在您的问题中提供一个最小数据集，以便为其他用户回答您的问题。请注意，这不是您问题的答案，而是告诉您如何提问的链接。

标签： r ggplot2 histogram

【解决方案1】：

    dataIn = read.table(text="
    ip_addr_player_id,  event_name, level, points_earned, stars_earned, moves
    118.93.180.241, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,   2 
    118.93.180.241, Puzzle Complete, Botany Lab Puzzle 2, 800, 2,   2 
    118.93.180.241, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,   2 
    203.166.252.219, Puzzle Complete, Botany Lab Puzzle 1, 1000, 2,  2     
    54.166.252.324, Puzzle Complete, Botany Lab Puzzle 5, 1000, 2,  2
    ",header=T, sep=",")
    dataIn



    # get uniqe players
    players=unique(dataIn$ip_addr_player_id)
    players
    library(data.table)
    #loop over players
    for (i in players) {
      #print (i)

      #select rows for uniq ip_addr_player_id
      index=which(dataIn$ip_addr_player_id ==i)
      #print(index)

      #get dataframe of the coresponding index
      p1=dataIn[index,]

      # get data table
      DT <- data.table(p1)
     #  print(DT)
     # group by level
     dt1= DT[, sum(points_earned), by = level]
      #save the each plot to a file
       png(filename=sprintf("%s.png",i ))
     # set ip as a title for the graph
     barplot(dt1$V1, names.arg=dt1$level, main = i)
     # do the same for other variables for barplot
      dev.off()
    }

Review a partial result online

【讨论】：

嘿！谢谢，@M.Hassan。这完美地工作。我确实有一个问题。我有 899 个唯一的玩家 ID。但我不明白为什么这个循环只考虑第一个玩家 id 并执行。 #select rows for uniq ip_addr_player_id index=which(dataIn$ip_addr_player_id ==i) 这仅选择 i=1 的行。
index 是一个向量，它选择与 ip_addr_player_id（匹配 ip 的数据子集）匹配的所有行，例如，对于 i= "118.93.180.241"，index =( 1 2 3)，所以 3 行被选中的不是一个。尝试打印索引并查看结果。
在这里您可以找到部分结果作为概念证明：rextester.com/IYYD19716
我的意思是在它完成 i=1 的所有操作后，循环应该为 i=2 和索引应该为相应的 i=2 选择行，依此类推，直到 i=899.因此它应该绘制 899 个条形图。但这并没有发生。
如果你在 Rsudio 工作，你只能找到最后一个情节。我将修改我的代码以将绘图保存到单独的文件中。如果您找到 899 张图像，请告诉我。

【解决方案2】：

例如，您可以使用循环

X = your_dataframe

vector_of_levels_you_want = 1:ncol(your_dataframe)

subset_level_1 = your_dataframe[which(your_dataframe[,column_of_your_level] == "Botany Lab Puzzle 1"),]
subset_level_2 = your_dataframe[which(your_dataframe[,column_of_your_level] == "Botany Lab Puzzle 2"),]
subset_level_3 = your_dataframe[which(your_dataframe[,column_of_your_level] == "Botany Lab Puzzle 3"),]

for (col in vector_of_levels_you_want {
    hist(subset_level_1[,col])
    hist(subset_level_2[,col])
    hist(subset_level_3[,col])
}

【讨论】：

操作！！对不起这是我的错。我忘了补充一点，每个 ip_addr_player 可以有多个不同级别的行（Botany Lab Puzzle 1 或 Botany Lab Puzzle 2 或 Botany Lab Puzzle 3 等等）。
编辑代码进行调整，这很可能不是最好的方法，但我认为它应该可以工作。也不要太在意循环，只有当你有数十万列时才会成为问题，在这种情况下，R 无论如何都不适合 IMO。