如何在 gt_summary 中为“未知”添加百分比答案

【问题标题】：How to add percentage to "unknown" in gtsummary如何在 gt_summary 中为“未知”添加百分比
【发布时间】：2025-11-25 11:50:02
【问题描述】：

我有一个包含大量未知数的连续变量。我的顾问要求我将百分比放在旁边的列中。这个代表模仿了我正在尝试做的事情。

library(tidyverse)
library(gtsummary)

  trial %>%       # included with gtsummary package
  select(trt, age, grade) %>%
  tbl_summary()

我正在尝试将未知数的百分比列在未知数旁边，最好放在括号中。它看起来像 11 (5.5%)。

有些人回复了我的数据集中缺失数据如何显示的请求，这是一个代表

library(gtsummary)
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.0.3
#> Warning: package 'readr' was built under R version 4.0.3
library(gtsummary)

df<-
  tibble::tribble(
               ~age,       ~sex,  ~race,          ~weight,
  70, "male",  "white",       50,
  57, "female", "african-american",   87,
  64,  "male",  "white",              NA,
  46,  "male",  "white", 49,
  87,  "male",  "hispanic", 51
  )

df %>%
  select(age,sex,race,weight) %>%
  tbl_summary(type = list(age ~ "continuous", weight ~ "continuous"), missing="ifany")

【问题讨论】：

我不确定您提供的示例数据中是否存在任何缺失值，因此它对测试不是很有用。也许你想要tbl_summary(missing="ifany")？否则，这些“未知数”究竟是如何在您的数据中编码的？
根据表格，其中 11 名受试者的年龄未知。我假设平均值可用于 189 名受试者，而 11 名受试者有缺失值，但我可能错了吗？
啊，好的。好的。 missing="ifany" 是默认值。如果您有“未知”值，则应将它们编码为 NA 值，以便 R 知道它们丢失了。目前尚不清楚您的实际数据是什么样的，所以我不确定问题出在哪里。
@MrFlick 更新了原始帖子中的 reprex

标签： r gtsummary

【解决方案1】：

有几种方法可以报告缺失率。我将在下面举例说明，您可能会选择最适合您的解决方案。

分类变量：我建议您在将数据框传递给tbl_summary() 之前将缺失值设为显式因子水平。 NA 值将不再丢失，并将像变量的任何其他级别一样计入。
连续变量：使用statistic= 参数报告缺失率。
所有变量：使用add_n() 报告缺失率

library(gtsummary)

trial %>%      
  select(age, response, trt) %>%
  # making the NA value explicit level of factor with `forcats::fct_explicit_na()`
  dplyr::mutate(response = factor(response) %>% forcats::fct_explicit_na()) %>%
  tbl_summary(
    by = trt,
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous() ~ c("{N_nonmiss}/{N_obs} {p_nonmiss}%",
                                     "{median} ({p25}, {p75})")
  ) %>%
  add_n(statistic = "{n} / {N}")

编辑：在原始海报的 cmets 之后添加更多示例。

library(gtsummary)

trial %>%      
  select(age, response, trt) %>%
  # making the NA value explicit level of factor with `forcats::fct_explicit_na()`
  dplyr::mutate(response = factor(response) %>% forcats::fct_explicit_na(na_level = "Unknown")) %>%
  tbl_summary(
    by = trt,
    type = all_continuous() ~ "continuous2",
    missing = "no",
    statistic = all_continuous() ~ c("{median} ({p25}, {p75})",
                                     "{N_miss} ({p_miss}%)")
  ) %>%
  # udpating the Unknown label in the `.$table_body`
  modify_table_body(
    dplyr::mutate,
    label = ifelse(label == "N missing (% missing)",
                   "Unknown",
                   label)
  )

【讨论】：

我拥有的变量是一个连续变量（就像reprex中的年龄），所以把它变成一个因子是行不通的，否则这将是完美的。
使用您的代码，我可以在下面模拟我想要的东西，只是我希望“N missing (% not missing)%”读作“Unknown”library(gtsummary) trial %>% select(age) %>% tbl_summary(missing = "no", type = all_continuous() ~ "continuous2", statistic = all_continuous() ~ c("{N_miss} ({p_nonmiss})%", "{median} ({p25}, {p75})") )
我添加了另一个示例，我认为正是您正在寻找的。span>
效果很好，但是有没有办法阻止它添加到其他连续变量中，特别是那些没有任何缺失值的变量？
使用 statistic 参数，您可以指定要为每个变量呈现的统计信息