【问题标题】:how to get variable name and labels from sas7bdat into a data.frame如何从 sas7bdat 获取变量名称和标签到 data.frame
【发布时间】:2020-12-16 03:45:45
【问题描述】:

我正在将一组 sas 数据读入 r。我想知道是否有一个代码可以用来将变量名称和变量标签放入 data.frame 中,或者像代码本一样?

我使用have包读取数据

haven:read_sas

我想知道它是否在某个地方保存了数据标签。如果是的话,我可以把它拿出来吗?

r 中的数据如下所示:

我想构建一个如下所示的 data.frame:

错误代码:

<error/purrr_error_bad_element_vector>
Result 6 must be a single string, not NULL of length 0
Backtrace:
     x
  1. +-base::debug(list_of_labels <- lapply(datasets, label_lookup_map))
  2. +-base::lapply(datasets, label_lookup_map)
  3. | \-global::FUN(X[[i]], ...)
  4. |   \-tibble::tibble(col_name = df %>% names(), labels = df %>% map_chr(attr_getter("label")))
  5. |     \-tibble:::tibble_quos(xs[!is_null], .rows, .name_repair)
  6. |       \-rlang::eval_tidy(xs[[j]], mask)
  7. +-df %>% map_chr(attr_getter("label"))
  8. | +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
  9. | \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 10. |   \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
 11. |     \-`_fseq`(`_lhs`)
 12. |       \-magrittr::freduce(value, `_function_list`)
 13. |         +-base::withVisible(function_list[[k]](value))
 14. |         \-function_list[[k]](value)
 15. |           \-purrr::map_chr(., attr_getter("label"))
 16. \-purrr:::stop_bad_element_vector(...)
 17.   \-purrr:::stop_bad_vector(...)
 18.     \-purrr:::stop_bad_type(...)

Itr 看起来错误是由如下所示的数据引起的:

样本数据可以通过

df<- structure(list(VISITNUM = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 14, 14, 14, 14), EXDOSE = c(36, 109, 182, 182, 
182, 182, 182, 55, 36, 55, 36, 55, 109, 182, 109, 182, 2600, 
2600, 2600, 2600), EXDOSU = c("mg", "mg", "mg", "mg", "mg", "mg", 
"mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", "mg", 
"mg", "mg", "mg")), label = "EX                              ", row.names = c(NA, 
20L), class = "data.frame")

【问题讨论】:

    标签: r


    【解决方案1】:

    您可能会发现这个问题很有帮助:Extract the labels attribute from "labeled" tibble columns from a haven import from Stata

    这是一个例子:

    library(haven)
    library(tidyverse)
    
    airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")
    
    label_lookup_map <- tibble(
      col_name = airline %>% names(),
      labels = airline %>% map_chr(attr_getter("label"))
    )
    
    print(label_lookup_map)
    # # A tibble: 6 x 2
    # col_name labels         
    # <chr>    <chr>          
    # 1 YEAR   year           
    # 2 Y      level of output
    # 3 W      wage rate      
    # 4 R      interest rate  
    # 5 L      labor input    
    # 6 K      capital input
    

    编辑:基于 cmets,如果您想获取列表中多个 data.frames 的标签,其中一些 data.frames 没有标签。

    library(haven)
    library(tidyverse)
    
    airline <- read_sas("http://www.principlesofeconometrics.com/sas/airline.sas7bdat")
    cola <- read_sas("http://www.principlesofeconometrics.com/sas/cola.sas7bdat")
    data(iris)
    
    list_of_tbl <- list(airline, cola, iris)
    
    get_labels <- attr_getter("label")
    
    has_labels <- function(df) {
        !all(sapply(lapply(df, get_labels), is.null))
    }
    
    label_lookup_map <- function(df) {
    
        df_labels <- NA
        if (has_labels(df)) {
            df_labels <- df %>% map_chr(get_labels)
        }
     
      tibble(
        col_name = df %>% names,
        labels = df_labels
      )
    }
    
    list_of_labels <- lapply(list_of_tbl, label_lookup_map)
    
    print(list_of_labels)
    # [[1]]
    # # A tibble: 6 x 2
    #   col_name labels         
    #   <chr>    <chr>          
    # 1 YEAR     year           
    # 2 Y        level of output
    # 3 W        wage rate      
    # 4 R        interest rate  
    # 5 L        labor input    
    # 6 K        capital input  
    
    # [[2]]
    # # A tibble: 5 x 2
    #   col_name labels                                   
    #   <chr>    <chr>                                    
    # 1 ID       customer id                              
    # 2 CHOICE   = 1 if brand chosen                      
    # 3 PRICE    price of 2 liter soda                    
    # 4 FEATURE  = 1 featured item at the time of purchase
    # 5 DISPLAY  = 1 if displayed at time of purchase     
    
    # [[3]]
    # # A tibble: 5 x 2
    #   col_name     labels
    #   <chr>        <lgl> 
    # 1 Sepal.Length NA    
    # 2 Sepal.Width  NA    
    # 3 Petal.Length NA    
    # 4 Petal.Width  NA    
    # 5 Species      NA 
    

    【讨论】:

    • 非常感谢。如果airline 是一个包含多个data.frame 的列表怎么办?如何到达data.frame 级别然后获取col_name 和标签?
    • 也许你可以创建一个函数并使用lapply
    • 你能举个例子吗?
    • 如果airline有3个data.frame:df1,df2,df3。如何为映射文件分配名称。我能想到label_lookup_map &lt;- function(x) {tibble( col_name = x %&gt;% names(), labels = x %&gt;% map_chr(attr_getter("label")) )} lapply(airline, label_lookup_map)
    • 非常感谢您更新的答案。当我对我的数据运行代码时,我得到了错误代码Error: Result 6 must be a single string, not NULL of length 0。你知道这意味着什么吗?我应该怎么做才能解决这个问题?我应该怎么做才能查看导致错误的原因?
    猜你喜欢
    • 2015-03-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-10-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多