如何在Stata中获得两个日期向量的并集？答案

【问题标题】：How to get the union of two date vectors in Stata?如何在Stata中获得两个日期向量的并集？
【发布时间】：2015-04-10 15:09:31
【问题描述】：

我有以下数据：

id  test1   test1_date    test2 test2_date
1   2   Jun 23, 2014 21:29  26  Jun 20, 2014 06:27
1   2   Jun 24, 2014 01:44  25  Jun 21, 2014 02:53 
1   2   Jun 24, 2014 06:20  25  Jun 22, 2014 07:38
2   2   Jun 25, 2014 22:15  30  Jun 26, 2014 11:08
2   0   Jun 26, 2014 02:35  25  Jun 27, 2014 20:09
2   2   Jun 26, 2014 06:49  25  Jun 30, 2014 14:47

这是所谓的宽格式。我想将其转换为长格式，如下所示：

id  date               test value
1   Jun 20, 2014 06:27  2   26
1   Jun 21, 2014 02:53  2   25
1   Jun 22, 2014 07:38  2   25
1   Jun 23, 2014 21:29  1   2
1   Jun 24, 2014 01:44  1   2
1   Jun 24, 2014 06:20  1   2
2   Jun 25, 2014 22:15  1   2
2   Jun 26, 2014 02:35  1   0
2   Jun 26, 2014 06:49  1   2
2   Jun 26, 2014 11:08  2   30
2   Jun 27, 2014 20:09  2   25
2   Jun 30, 2014 14:47  2   25

我尝试了reshape 命令：

reshape test1 test2, i(id)

但是，它会创建一个缺失值向量。另一个尝试是

reshape long test1 test2 , i(id test1_date test2_date)

【问题讨论】：

在尝试使用 reshape 并失败后，我相信创建两个数据集并使用 append 将是这个简单任务的最简单方法。如果有很多日期/时间对，我可能会有不同的感觉。
以日期时间显示格式显示的日期时间对于成员来说非常难以试验。您的数据集应以易于导入的形式显示。
Mata 有向量。这里的向量大概是指变量。
这与stackoverflow.com/questions/29536315/… 或stackoverflow.com/questions/28839986/… 有什么关系？你似乎一次又一次地问同样的问题。请明确链接。

标签： stata reshape

【解决方案1】：

这是个人喜好，但我建议不要在这里使用“格式”一词。它已经超载（显示格式、文件格式）。我建议简单地“塑造”。

问题是可以解决的，但是你需要两个小技巧来扭转一个误解：

Stata 想要另一个标识符变量，因为它的基本思想是reshape 应该是可逆的。因此，即使它是任意的，也需要创建它。
变量名将从一些工作中受益。
日期变量不能用于识别观察组，因为它们的值是不同的，而不是重复的。

this FAQ 中有一般建议（除了帮助和手动输入）。

其中一些语法只是将您的示例设置为可重现代码所需要的。一个好问题会为我们做到这一点！

. clear

. input id  test1 str18  Stest1_date    test2 str18 Stest2_date

            id      test1         Stest1_date      test2         Stest2_date
  1. 1   2   "Jun 23, 2014 21:29"  26  "Jun 20, 2014 06:27"
  2. 1   2   "Jun 24, 2014 01:44"  25  "Jun 21, 2014 02:53" 
  3. 1   2   "Jun 24, 2014 06:20"  25  "Jun 22, 2014 07:38"
  4. 2   2   "Jun 25, 2014 22:15"  30  "Jun 26, 2014 11:08"
  5. 2   0   "Jun 26, 2014 02:35"  25  "Jun 27, 2014 20:09"
  6. 2   2   "Jun 26, 2014 06:49"  25  "Jun 30, 2014 14:47"
  7. end 

. 
. gen double test1_date = clock(Stest1_date, "MDY hm")

. gen double test2_date = clock(Stest2_date, "MDY hm")

. drop S*

. format t*date %tc

. l, sepby(id)

     +--------------------------------------------------------------+
     | id   test1   test2           test1_date           test2_date |
     |--------------------------------------------------------------|
  1. |  1       2      26   23jun2014 21:29:00   20jun2014 06:27:00 |
  2. |  1       2      25   24jun2014 01:44:00   21jun2014 02:53:00 |
  3. |  1       2      25   24jun2014 06:20:00   22jun2014 07:38:00 |
     |--------------------------------------------------------------|
  4. |  2       2      30   25jun2014 22:15:00   26jun2014 11:08:00 |
  5. |  2       0      25   26jun2014 02:35:00   27jun2014 20:09:00 |
  6. |  2       2      25   26jun2014 06:49:00   30jun2014 14:47:00 |
     +--------------------------------------------------------------+

. 
. bysort id : gen j = _n

. rename (test1_date test2_date) (date1 date2)

. reshape long test date, i(id j)
(note: j = 1 2)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      12
Number of variables                   6   ->       5
j variable (2 values)                     ->   _j
xij variables:
                            test1 test2   ->   test
                            date1 date2   ->   date
-----------------------------------------------------------------------------

. l, sepby(id)

     +-----------------------------------------+
     | id   j   _j   test                 date |
     |-----------------------------------------|
  1. |  1   1    1      2   23jun2014 21:29:00 |
  2. |  1   1    2     26   20jun2014 06:27:00 |
  3. |  1   2    1      2   24jun2014 01:44:00 |
  4. |  1   2    2     25   21jun2014 02:53:00 |
  5. |  1   3    1      2   24jun2014 06:20:00 |
  6. |  1   3    2     25   22jun2014 07:38:00 |
     |-----------------------------------------|
  7. |  2   1    1      2   25jun2014 22:15:00 |
  8. |  2   1    2     30   26jun2014 11:08:00 |
  9. |  2   2    1      0   26jun2014 02:35:00 |
 10. |  2   2    2     25   27jun2014 20:09:00 |
 11. |  2   3    1      2   26jun2014 06:49:00 |
 12. |  2   3    2     25   30jun2014 14:47:00 |
     +-----------------------------------------+

【讨论】：

对于以reshape 开头的读者，请注意stubnames 也可以采用更“复杂”的形式。在这个例子中，我们可以直接使用reshape long test test@_date, i(id j)，而不是使用rename。
@RobertoFerrer 非常正确，我已经相应地调整了答案。对我来说，语法（尽管完全合理）太随意了，以至于我每次都无法查找或记住，所以在实践中，我总是更喜欢直接调整语法令人难忘的变量名。但你暗示这是个人喜好，而不是绝对必要是正确的。
我最初认为它非常神秘。然后我开始将其视为j 的占位符。从那以后它就有点卡住了。
好提示。我想，当有选择的时候，我们都会避免我们不喜欢的语法。