Python中的多元线性混合效应模型答案

【问题标题】：Multivariate linear mixed effects model in PythonPython中的多元线性混合效应模型
【发布时间】：2018-07-26 14:50:03
【问题描述】：

我正在玩这个用于单变量线性混合效果建模的代码。数据集表示：

学生作为s
教师作为 d
部门作为部门
服务即服务

在 R 的 lme4 包（Bates et al., 2015）的语法中，实现的模型可以概括为：

y ~ 1 + (1|students) + (1|instructor) + (1|dept) + service

其中 1 表示截距项，(1|x) 表示 x 的随机效应，x 表示固定效应。

    from __future__ import absolute_import
    from __future__ import division
    from __future__ import print_function

    import edward as ed
    import pandas as pd
    import tensorflow as tf
    import matplotlib.pyplot as plt

    from edward.models import Normal
    from observations import insteval

    data = pd.DataFrame(data, columns=metadata['columns'])
    train = data.sample(frac=0.8)
    test = data.drop(train.index)
    train.head()

    s_train = train['s'].values
    d_train = train['dcodes'].values
    dept_train = train['deptcodes'].values
    y_train = train['y'].values
    service_train = train['service'].values
    n_obs_train = train.shape[0]

    s_test = test['s'].values
    d_test = test['dcodes'].values
    dept_test = test['deptcodes'].values
    y_test = test['y'].values
    service_test = test['service'].values
    n_obs_test = test.shape[0]
    n_s = max(s_train) + 1  # number of students
    n_d = max(d_train) + 1  # number of instructors
    n_dept = max(dept_train) + 1  # number of departments
    n_obs = train.shape[0]  # number of observations

    # Set up placeholders for the data inputs.
    s_ph = tf.placeholder(tf.int32, [None])
    d_ph = tf.placeholder(tf.int32, [None])
    dept_ph = tf.placeholder(tf.int32, [None])
    service_ph = tf.placeholder(tf.float32, [None])

    # Set up fixed effects.
    mu = tf.get_variable("mu", [])
    service = tf.get_variable("service", [])

    sigma_s = tf.sqrt(tf.exp(tf.get_variable("sigma_s", [])))
    sigma_d = tf.sqrt(tf.exp(tf.get_variable("sigma_d", [])))
    sigma_dept = tf.sqrt(tf.exp(tf.get_variable("sigma_dept", [])))

    # Set up random effects.
    eta_s = Normal(loc=tf.zeros(n_s), scale=sigma_s * tf.ones(n_s))
    eta_d = Normal(loc=tf.zeros(n_d), scale=sigma_d * tf.ones(n_d))
    eta_dept = Normal(loc=tf.zeros(n_dept), scale=sigma_dept * tf.ones(n_dept))

    yhat = (tf.gather(eta_s, s_ph) +
            tf.gather(eta_d, d_ph) +
            tf.gather(eta_dept, dept_ph) +
            mu + service * service_ph)
    y = Normal(loc=yhat, scale=tf.ones(n_obs))

    #Inference

    q_eta_s = Normal(
        loc=tf.get_variable("q_eta_s/loc", [n_s]),
        scale=tf.nn.softplus(tf.get_variable("q_eta_s/scale", [n_s])))
    q_eta_d = Normal(
        loc=tf.get_variable("q_eta_d/loc", [n_d]),
        scale=tf.nn.softplus(tf.get_variable("q_eta_d/scale", [n_d])))
    q_eta_dept = Normal(
        loc=tf.get_variable("q_eta_dept/loc", [n_dept]),
        scale=tf.nn.softplus(tf.get_variable("q_eta_dept/scale", [n_dept])))

    latent_vars = {
        eta_s: q_eta_s,
        eta_d: q_eta_d,
        eta_dept: q_eta_dept}
    data = {
        y: y_train,
        s_ph: s_train,
        d_ph: d_train,
        dept_ph: dept_train,
        service_ph: service_train}
    inference = ed.KLqp(latent_vars, data)

这适用于线性混合效应建模的单变量情况。我正在尝试将这种方法扩展到多变量案例。任何想法都非常受欢迎。

【问题讨论】：

标签： python statistics analysis

【解决方案1】：

有多种方法可以在 Python 中进行线性混合效果模型。看起来您已经适应了 Tensorflow approach，但如果这不是硬性要求，那么还有其他几个可能更方便的选项。

您可以使用 LMER 的Statsmodels implementation，它方便地包含在 Python 中，但语法与 R 的 LMER 中的传统公式表达式有点不同。看起来您正在使用 python 将数据拆分为训练集和测试集，因此您还可以编写一个循环来调用
您还可以在本地计算机上安装 R 和 rpy2，并从 Python 环境中调用 LMER 包。这使您可以保持熟悉在 R 中工作，但允许您在 Python 中执行其他所有操作。您所要做的就是在 Jupyter Notebooks 的单元块中使用 rmagic %%R 或（%R 表示内联）在 Python 和 R 之间传递变量和模型。如果您要传递训练/测试数据，后者会很有用您在 Python 中拆分为 R 以运行 lmer 并在循环中检索参数。
最后，另一种选择是使用Pymer4，它是 rpy2 的包装器，允许您在 R 中直接调用 LMER，而无需处理 rmagic。

我wrote a tutorial 介绍了如何将 LMER 与这些方法一起使用，这些方法也适用于 Google Colab 等云设置。这些方法都允许您运行多元方法，就像您要求在 R 中使用 LMER 但从 Python 环境中一样。

【讨论】：