【发布时间】:2021-03-30 09:55:03
【问题描述】:
我有以下性格:
cal = "\n \n21/01/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n21/01/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n03/02/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n17/02/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n11/03/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n11/03/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n24/03/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n25/03/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n22/04/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n22/04/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n12/05/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n10/06/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in the Netherlands\n \n \n10/06/2021\n\n \nPress conference following the Governing Council meeting of the ECB in the Netherlands\n \n \n23/06/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n24/06/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n22/07/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n22/07/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n09/09/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n09/09/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n22/09/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n23/09/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n06/10/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n28/10/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n28/10/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n10/11/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n01/12/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n02/12/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n16/12/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n16/12/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n"
cal = gsub( "\n", " ", calendar)
正如您在文本中看到的那样,既有日期也有文本。我想做的是将文本变成两列:“日期”和“事件”。
这将是结果(为简单起见仅显示第一行):
Date Event
21/01/2021 Governing Council of the ECB: monetary policy meeting in Frankfurt
21/01/2021 Press conference following the Governing Council meeting of the ECB...
03/02/2021 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
17/02/2021 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
11/03/2021 Governing Council of the ECB: monetary policy meeting in Frankfurt
...
我尝试了许多将语料库重塑为句子的函数以及提取日期的函数,但我没能做到。例如:
library(anytime)
anydate(str_extract_all(cal, "[[:alnum:]]+[ /]*\\d{2}[ /]*\\d{4}")[[1]]) %>% as.data.frame()
# it gives me back lot of NAs, I don't know why
[1] NA NA "2021-03-02" NA "2021-11-03" "2021-11-03" NA
[8] NA NA NA "2021-12-05" "2021-10-06" "2021-10-06" NA
[15] NA NA NA "2021-09-09" "2021-09-09" NA NA
[22] "2021-06-10" NA NA "2021-10-11" "2021-01-12" "2021-02-12" NA
[29] NA
谁能帮帮我?
谢谢!
【问题讨论】: