【发布时间】:2016-11-17 17:46:47
【问题描述】:
我在尝试将 API 流程自动化到 BigQuery 时遇到了问题。
问题是我需要将数据以换行符分隔的 JSON 格式输入我的 BigQuery 数据库,但我要提取的数据不这样做,所以我需要将其解析出来。
Here is a link to pastebin so you can get an idea of what the data looks like,还有,这里只是因为:
{"type":"user.list","users":[{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test@gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}},{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test@mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}],"scroll_param":"24ba0fac-b8f9-46b2-944a-9bb523dcd1b1"}
两个问题是第一行:
{"type":"user.list","users":
以及底部的最后一块:
,"scroll_param":"24bd0rac-b2f9-46b2-944a-9zz543dcd1b1"}
如果消除这两个,您只剩下所需的必要数据,并且我知道需要什么过滤器来解析它以将其放入换行符分隔的格式。
You can see for yourself by playing around with this tool,但如果您只复制并粘贴从第一个左括号到最后一行右括号的所有内容,请将其设置为“压缩输出”并应用过滤器:
.[]
结果会像你在这里看到的一样,in a nice and neat newline delimited format like you see here.,这里也没有链接:
{"type":"user","id":"581c13632f25960e6e3dc89a","user_id":"ieo2e6dtsqhiyhtr","anonymous":false,"email":"test@gmail.com","name":"Joe Martinez","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"Houston","continent_code":"NA","country_name":"United States","latitude":29.7633,"longitude":-95.3633,"postal_code":"77002","region_name":"Texas","timezone":"America/Chicago","country_code":"USA"},"last_request_at":1478235114,"last_seen_ip":"66.87.120.30","created_at":1478234979,"remote_created_at":1478234944,"signed_up_at":1478234944,"updated_at":1478235145,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Linux; Android 6.0.1; SM-G920P Build/MMB29K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.68 Mobile Safari/537.36","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"1","memberType":"claimant"}}
{"type":"user","id":"581c22a19a1dc02c460541df","user_id":"1o3helrdv58cxm7jf","anonymous":false,"email":"test@mail.com","name":"Joe Coleman","pseudonym":null,"avatar":{"type":"avatar","image_url":null},"app_id":"b5vkxvop","companies":{"type":"company.list","companies":[]},"location_data":{"type":"location_data","city_name":"San Jose","continent_code":"NA","country_name":"United States","latitude":37.3394,"longitude":-121.895,"postal_code":"95141","region_name":"California","timezone":"America/Los_Angeles","country_code":"USA"},"last_request_at":1478239113,"last_seen_ip":"216.151.183.47","created_at":1478238881,"remote_created_at":1478238744,"signed_up_at":1478238744,"updated_at":1478239113,"session_count":1,"social_profiles":{"type":"social_profile.list","social_profiles":[]},"unsubscribed_from_emails":false,"user_agent_data":"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0","tags":{"type":"tag.list","tags":[]},"segments":{"type":"segment.list","segments":[{"type":"segment","id":"57d2ea275bfcebabd516d963"},{"type":"segment","id":"57d2ea265bfcebabd516d962"}]},"custom_attributes":{"claimCount":"2","memberType":"claimant"}}
所以我需要一个过滤器,我可以以与使用 .[] 相同的方式应用它,它会拉出第一个开括号之前的所有文本(正如我在上面突出显示的)以及关闭之前的所有文本括号结尾。
但这就是最后一个问题所在。虽然我需要最后一段文本,但我仍然需要被称为滚动参数的字母和数字字符串。这是因为为了在 API 中完全捕获我需要的所有数据,我需要不断使用它从命令行调用生成的新滚动参数,直到所有数据都进入。
初始调用如下所示:
$ curl -s https://api.program.io/users/scroll -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json'
但为了获取所有信息,我需要该滚动参数来进行单独调用,如下所示:
curl -s https://api.intercom.io/users/scroll?scroll_param=foo -u 'dG9rOmU5NGFjYTkwXzliNDFfNGIyMF9iYzA0XzU0NDg3MjE5ZWJkZDoxOjA=': -H 'Accept:application/json' >scroll.json
因此,虽然我需要删除包含参数的 blob 中的文本以便将其置于换行符分隔的格式中,但我仍然需要提取该参数的任何内容以循环回将继续运行的另一个脚本直到它是空的。
很想听听有关解决此问题的任何建议!
【问题讨论】:
-
请在问题本身的minimal reproducible example 中提供所有相关代码,而不是在第三方网站上。
-
真的吗?它真的有区别吗?你们真的要让我去格式化所有东西,当它方便地以易于消化的格式坐在那里时?
-
当那个站点出现故障时?这个问题的未来读者会怎么做?发挥他们的想象力来了解数据应该是什么?
-
好吧,我把它添加到我的问题中,以防万一已经存在 15 年的 siet 出现故障
-
我工作了 15 年,但我从未听说过 new-line-delimited-json。解析 JSON 并获取所需数据有什么问题?
标签: json parsing google-bigquery jq