【问题标题】:How to loop - JSONP / JSON data using R如何循环 - 使用 R 的 JSONP / JSON 数据
【发布时间】:2016-05-22 15:19:50
【问题描述】:

我以为我已经使用 jsonlitetidyjson 正确解析了数据。但是,我注意到只有第一页的数据正在被解析。请建议我如何正确解析所有页面。总页数超过 1300 - 如果我查看 json 输出,所以我认为数据可用但未正确解析。

注意:我使用过tidyjson,但我也愿意使用jsonlite 或任何其他库。

library(dplyr)
library(tidyjson)
library(jsonlite)

 req <- httr::GET("http://svcs.ebay.com/services/search/FindingService/v1?OPERATION-NAME=findItemsByKeywords&SERVICE-VERSION=1.0.0&SECURITY-APPNAME=xxxxxx&GLOBAL-ID=EBAY-US&RESPONSE-DATA-FORMAT=JSON&callback=_cb_findItemsByKeywords&REST-PAYLOAD&keywords=harry%20potter&paginationInput.entriesPerPage=100")

txt <- content(req, "text")

json <- sub("/**/_cb_findItemsByKeywords(", "", txt, fixed = TRUE)

json <- sub(")$", "", json)

data1 <- json %>% as.tbl_json %>% 

  enter_object("findItemsByKeywordsResponse") %>% gather_array %>%       enter_object("searchResult") %>% gather_array %>%  
  enter_object("item") %>% gather_array %>%
  spread_values(
    ITEMID = jstring("itemId"),
    TITLE = jstring("title")
  ) %>%
  select(ITEMID, TITLE) # select only what is needed

############################################################ 

*Note: "paginationOutput":[{"pageNumber":["1"],"entriesPerPage":["100"],"totalPages":["1393"],"totalEntries":["139269"]}]

* &_ipg=100&_pgn=1"

【问题讨论】:

    标签: json r pagination dataframe jsonlite


    【解决方案1】:

    不需要tidyjson。您需要编写另一个函数/一组调用来获取总页数(超过 1,400)才能使用以下内容,但这应该相当简单。尝试进一步划分您的操作,并在可以参数化的情况下使用httr 的全部功能:

    library(dplyr)
    library(jsonlite)
    library(httr)
    library(purrr)
    
    get_pg <- function(i) {
    
      cat(".") # shows progress
    
      req <- httr::GET("http://svcs.ebay.com/services/search/FindingService/v1",
                       query=list(`OPERATION-NAME`="findItemsByKeywords",
                                  `SERVICE-VERSION`="1.0.0",
                                  `SECURITY-APPNAME`="xxxxxxxxxxxxxxxxxxx",
                                  `GLOBAL-ID`="EBAY-US",
                                  `RESPONSE-DATA-FORMAT`="JSON",
                                  `REST-PAYLOAD`="",
                                  `keywords`="harry potter",
                                  `paginationInput.pageNumber`=i,
                                  `paginationInput.entriesPerPage`=100))
    
      dat <- fromJSON(content(req, as="text", encoding="UTF-8"))
    
      map_df(dat$findItemsByKeywordsResponse$searchResult[[1]]$item, function(x) {
    
        data_frame(ITEMID=flatten_chr(x$itemId),
                   TITLE=flatten_chr(x$title))
    
      })
    
    }
    
    # "10" will need to be the max page number. I wasn't about to 
    # make 1,400 requests to ebay. I'd probably break them up into 
    # sets of 30 or 50 and save off temporary data frames as rdata files
    # just so you don't get stuck in a situation where R crashes and you
    # have to get all the data again.
    
    srch_dat <- map_df(1:10, get_pg)
    
    srch_dat
    
    ## Source: local data frame [1,000 x 2]
    ## 
    ##          ITEMID                                                                            TITLE
    ##           (chr)                                                                            (chr)
    ## 1  371533364795                 Harry Potter: Complete 8-Film Collection (DVD, 2011, 8-Disc Set)
    ## 2  331128976689                   HOT New Harry Potter 14.5" Magical Wand Replica Cosplay In Box
    ## 3  131721213216                 Harry Potter: Complete 8-Film Collection (DVD, 2011, 8-Disc Set)
    ## 4  171430021529   New Harry Potter Hermione Granger Rotating Time Turner Necklace Gold Hourglass
    ## 5  261597812013            Harry Potter Time Turner+GOLD Deathly Hallows Charm Pendant necklace 
    ## 6  111883750466                 Harry Potter: Complete 8-Film Collection (DVD, 2011, 8-Disc Set)
    ## 7  251947403227                   HOT New Harry Potter 14.5" Magical Wand Replica Cosplay In Box
    ## 8  351113839731 Marauder's Map Hogwarts Wizarding World Harry Potter Warner Bros LIMITED **NEW**
    ## 9  171912724869 Harry Potter Time Turner Necklace Hermione Granger Rotating Spins Gold Hourglass
    ## 10 182024752232  Harry Potter : Complete 8-Film Collection (DVD, 2011, 8-Disc Set) Free Shipping
    ## ..          ...                                                                              ...
    

    【讨论】:

    • @hrbrmster 非常感谢。真的很有帮助。如果我必须修改脚本以根据总页数自动运行...您能否告诉我如何查找总页数,即 srch_dat
    • dat$findItemsByKeywordsResponse$paginationOutput[[1]]$totalEntries 会告诉你的。
    猜你喜欢
    • 1970-01-01
    • 2013-07-12
    • 2014-03-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-11-18
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多