【发布时间】:2017-11-28 20:38:52
【问题描述】:
我对正则表达式非常陌生,并且在 payload.hours 中有一些 JSON 数据:
这些代表不同地点一周中每一天的营业时间。将这些数据提取到同一数据框的新列中的最干净的方法是什么?我希望将每天的营业时间提取到单独的列中。
在此处附加数据 - :
structure(list(payload.name = c("Windsor Castle", "Penrhyn Castle",
"Barcaldine Castle", "University of Birmingham", "Kensington Gardens",
"Bristol Zoo Gardens", "Alton Towers Resort", "The Royal Shakespeare Company",
"University of Surrey", "Hampton Court Palace", "BFI Imax", "Usher Gallery",
"The Cambridge Chop House", "The Balmoral Hotel", "Scottish National Portrait Gallery",
"Old Trafford Stadium", "Exeter Cathedral", "Excel London", "Cavern Club"
), payload.hours_display = c("Mon-Sat 11:30 AM-11:00 PM; Sun 12:00 PM-11:00 PM",
"Open Daily 11:00 AM-5:00 PM", "Open Daily 9:00 AM-10:00 PM",
"Mon-Fri 8:00 AM-7:00 PM; Sat-Sun 10:00 AM-3:00 PM", "Open Daily 5:00 AM-11:59 PM",
"Open Daily 9:00 AM-5:00 PM", "Open Daily 10:30 AM-5:00 PM",
"Mon-Sat 11:30 AM-11:00 PM; Sun 11:30 AM-4:30 PM", "Mon-Fri 8:30 AM-11:00 PM; Sat-Sun 10:00 AM-11:00 PM",
"Mon-Sat 12:00 PM-11:00 PM; Sun 12:00 PM-10:30 PM", "Mon 1:30 PM-11:00 PM; Tue 10:30 AM-11:00 PM; Wed-Fri 1:30 PM-11:00 PM; Sat-Sun 11:00 AM-11:00 PM",
"Open Daily 10:00 AM-4:00 PM", "Mon-Thu 12:00 PM-10:30 PM; Fri-Sat 12:00 PM-11:00 PM; Sun 12:00 PM-9:30 PM",
"Open Daily 12:00 AM-12:00 AM", "Mon-Wed 10:00 AM-5:00 PM; Thu 10:00 AM-7:00 PM; Fri-Sun 10:00 AM-5:00 PM",
"Open Daily 9:30 AM-5:00 PM", "Mon-Sat 9:30 AM-4:45 PM; Sun 11:30 AM-3:30 PM",
"Open Daily 5:30 AM-10:00 PM", "Mon 12:00 AM-12:30 AM, 11:00 AM-8:00 PM; Tue-Wed 11:00 AM-8:00 PM; Thu 11:00 AM-11:59 PM; Fri 12:00 AM-2:00 AM, 11:00 AM-11:59 PM; Sat-Sun 12:00 AM-2:30 AM, 11:00 AM-11:59 PM"
), payload.hours = c("{\"monday\":[[\"11:30\",\"23:00\"]],\"tuesday\":[[\"11:30\",\"23:00\"]],\"wednesday\":[[\"11:30\",\"23:00\"]],\"thursday\":[[\"11:30\",\"23:00\"]],\"friday\":[[\"11:30\",\"23:00\"]],\"saturday\":[[\"11:30\",\"23:00\"]],\"sunday\":[[\"12:00\",\"23:00\"]]}",
"{\"monday\":[[\"11:00\",\"17:00\"]],\"tuesday\":[[\"11:00\",\"17:00\"]],\"wednesday\":[[\"11:00\",\"17:00\"]],\"thursday\":[[\"11:00\",\"17:00\"]],\"friday\":[[\"11:00\",\"17:00\"]],\"saturday\":[[\"11:00\",\"17:00\"]],\"sunday\":[[\"11:00\",\"17:00\"]]}",
"{\"monday\":[[\"9:00\",\"22:00\"]],\"tuesday\":[[\"9:00\",\"22:00\"]],\"wednesday\":[[\"9:00\",\"22:00\"]],\"thursday\":[[\"9:00\",\"22:00\"]],\"friday\":[[\"9:00\",\"22:00\"]],\"saturday\":[[\"9:00\",\"22:00\"]],\"sunday\":[[\"9:00\",\"22:00\"]]}",
"{\"monday\":[[\"8:00\",\"19:00\"]],\"tuesday\":[[\"8:00\",\"19:00\"]],\"wednesday\":[[\"8:00\",\"19:00\"]],\"thursday\":[[\"8:00\",\"19:00\"]],\"friday\":[[\"8:00\",\"19:00\"]],\"saturday\":[[\"10:00\",\"15:00\"]],\"sunday\":[[\"10:00\",\"15:00\"]]}",
"{\"monday\":[[\"5:00\",\"23:59\"]],\"tuesday\":[[\"5:00\",\"23:59\"]],\"wednesday\":[[\"5:00\",\"23:59\"]],\"thursday\":[[\"5:00\",\"23:59\"]],\"friday\":[[\"5:00\",\"23:59\"]],\"saturday\":[[\"5:00\",\"23:59\"]],\"sunday\":[[\"5:00\",\"23:59\"]]}",
"{\"monday\":[[\"9:00\",\"17:00\"]],\"tuesday\":[[\"9:00\",\"17:00\"]],\"wednesday\":[[\"9:00\",\"17:00\"]],\"thursday\":[[\"9:00\",\"17:00\"]],\"friday\":[[\"9:00\",\"17:00\"]],\"saturday\":[[\"9:00\",\"17:00\"]],\"sunday\":[[\"9:00\",\"17:00\"]]}",
"{\"monday\":[[\"10:30\",\"17:00\"]],\"tuesday\":[[\"10:30\",\"17:00\"]],\"wednesday\":[[\"10:30\",\"17:00\"]],\"thursday\":[[\"10:30\",\"17:00\"]],\"friday\":[[\"10:30\",\"17:00\"]],\"saturday\":[[\"10:30\",\"17:00\"]],\"sunday\":[[\"10:30\",\"17:00\"]]}",
"{\"monday\":[[\"11:30\",\"23:00\"]],\"tuesday\":[[\"11:30\",\"23:00\"]],\"wednesday\":[[\"11:30\",\"23:00\"]],\"thursday\":[[\"11:30\",\"23:00\"]],\"friday\":[[\"11:30\",\"23:00\"]],\"saturday\":[[\"11:30\",\"23:00\"]],\"sunday\":[[\"11:30\",\"16:30\"]]}",
"{\"monday\":[[\"8:30\",\"23:00\"]],\"tuesday\":[[\"8:30\",\"23:00\"]],\"wednesday\":[[\"8:30\",\"23:00\"]],\"thursday\":[[\"8:30\",\"23:00\"]],\"friday\":[[\"8:30\",\"23:00\"]],\"saturday\":[[\"10:00\",\"23:00\"]],\"sunday\":[[\"10:00\",\"23:00\"]]}",
"{\"monday\":[[\"12:00\",\"23:00\"]],\"tuesday\":[[\"12:00\",\"23:00\"]],\"wednesday\":[[\"12:00\",\"23:00\"]],\"thursday\":[[\"12:00\",\"23:00\"]],\"friday\":[[\"12:00\",\"23:00\"]],\"saturday\":[[\"12:00\",\"23:00\"]],\"sunday\":[[\"12:00\",\"22:30\"]]}",
"{\"monday\":[[\"13:30\",\"23:00\"]],\"tuesday\":[[\"10:30\",\"23:00\"]],\"wednesday\":[[\"13:30\",\"23:00\"]],\"thursday\":[[\"13:30\",\"23:00\"]],\"friday\":[[\"13:30\",\"23:00\"]],\"saturday\":[[\"11:00\",\"23:00\"]],\"sunday\":[[\"11:00\",\"23:00\"]]}",
"{\"monday\":[[\"10:00\",\"16:00\"]],\"tuesday\":[[\"10:00\",\"16:00\"]],\"wednesday\":[[\"10:00\",\"16:00\"]],\"thursday\":[[\"10:00\",\"16:00\"]],\"friday\":[[\"10:00\",\"16:00\"]],\"saturday\":[[\"10:00\",\"16:00\"]],\"sunday\":[[\"10:00\",\"16:00\"]]}",
"{\"monday\":[[\"12:00\",\"22:30\"]],\"tuesday\":[[\"12:00\",\"22:30\"]],\"wednesday\":[[\"12:00\",\"22:30\"]],\"thursday\":[[\"12:00\",\"22:30\"]],\"friday\":[[\"12:00\",\"23:00\"]],\"saturday\":[[\"12:00\",\"23:00\"]],\"sunday\":[[\"12:00\",\"21:30\"]]}",
"{\"monday\":[[\"00:00\",\"24:00\"]],\"tuesday\":[[\"00:00\",\"24:00\"]],\"wednesday\":[[\"00:00\",\"24:00\"]],\"thursday\":[[\"00:00\",\"24:00\"]],\"friday\":[[\"00:00\",\"24:00\"]],\"saturday\":[[\"00:00\",\"24:00\"]],\"sunday\":[[\"00:00\",\"24:00\"]]}",
"{\"monday\":[[\"10:00\",\"17:00\"]],\"tuesday\":[[\"10:00\",\"17:00\"]],\"wednesday\":[[\"10:00\",\"17:00\"]],\"thursday\":[[\"10:00\",\"19:00\"]],\"friday\":[[\"10:00\",\"17:00\"]],\"saturday\":[[\"10:00\",\"17:00\"]],\"sunday\":[[\"10:00\",\"17:00\"]]}",
"{\"monday\":[[\"9:30\",\"17:00\"]],\"tuesday\":[[\"9:30\",\"17:00\"]],\"wednesday\":[[\"9:30\",\"17:00\"]],\"thursday\":[[\"9:30\",\"17:00\"]],\"friday\":[[\"9:30\",\"17:00\"]],\"saturday\":[[\"9:30\",\"17:00\"]],\"sunday\":[[\"9:30\",\"17:00\"]]}",
"{\"monday\":[[\"9:30\",\"16:45\"]],\"tuesday\":[[\"9:30\",\"16:45\"]],\"wednesday\":[[\"9:30\",\"16:45\"]],\"thursday\":[[\"9:30\",\"16:45\"]],\"friday\":[[\"9:30\",\"16:45\"]],\"saturday\":[[\"9:30\",\"16:45\"]],\"sunday\":[[\"11:30\",\"15:30\"]]}",
"{\"monday\":[[\"5:30\",\"22:00\"]],\"tuesday\":[[\"5:30\",\"22:00\"]],\"wednesday\":[[\"5:30\",\"22:00\"]],\"thursday\":[[\"5:30\",\"22:00\"]],\"friday\":[[\"5:30\",\"22:00\"]],\"saturday\":[[\"5:30\",\"22:00\"]],\"sunday\":[[\"5:30\",\"22:00\"]]}",
"{\"monday\":[[\"00:00\",\"00:30\"],[\"11:00\",\"20:00\"]],\"tuesday\":[[\"11:00\",\"20:00\"]],\"wednesday\":[[\"11:00\",\"20:00\"]],\"thursday\":[[\"11:00\",\"23:59\"]],\"friday\":[[\"00:00\",\"2:00\"],[\"11:00\",\"23:59\"]],\"saturday\":[[\"00:00\",\"2:30\"],[\"11:00\",\"23:59\"]],\"sunday\":[[\"00:00\",\"2:30\"],[\"11:00\",\"23:59\"]]}"
)), .Names = c("payload.name", "payload.hours_display", "payload.hours"
), row.names = c(3L, 5L, 10L, 14L, 18L, 19L, 24L, 32L, 33L, 35L,
36L, 37L, 38L, 42L, 43L, 44L, 45L, 47L, 54L), class = "data.frame")
我尝试使用 plyr::ldply() 函数根据此处的答案 - How to convert a list consisting of vector of different lengths to a usable data frame in R? 将这个 JSON 展平,并且非常接近,但时间没有按正确的顺序排列(有些日子有多个营业时间)
最后,我想通过提取周日下午 12 点之前营业的企业来操作数据框。
【问题讨论】:
-
jsonlite::fromJSON(x) -
这适用于 x 的这个特定值,但是当我尝试将它应用于整个列(更新问题)时,我得到一个解析错误:尾随垃圾
-
$payload.hours中的内容不会立即data.frameable。您能否包括初始数据第一行的预期输出? (对于每个 JSON 有效负载,它可能不会是一个非常宽的单行,您最好使用tidyr包,最初处理嵌套数据,直到您可以巧妙地取消嵌套。) -
dplyr::bind_cols(xdf, jsonlite::stream_in(textConnection(paste0(xdf$payload.hours, collapse="\n"))))假设xdf是数据框名称。