【发布时间】:2022-01-01 19:39:37
【问题描述】:
我有一个从嵌套 JSON 创建的 CSV 文件。它既有常规类型列(例如 int、string),也有从嵌套 JSON 创建的 JSON 列:
attributes;business_id;categories;city;days_open;latitude;longitude;name;review_count;stars;state
{"AcceptsInsurance": False, "AgesAllowed": "allages", "Alcohol": "beer_and_wine", "Ambience": {"casual": True, "classy": False, "divey": False, "hipster": False, "intimate": False, "romantic": False, "touristy": False, "trendy": False, "upscale": False}, "BYOB": False, "BikeParking": True, "BusinessAcceptsBitcoin": False, "BusinessAcceptsCreditCards": True, "BusinessParking": {"garage": False, "lot": False, "street": True, "valet": False, "validated": False}, "ByAppointmentOnly": False, "Caters": True, "CoatCheck": False, "Corkage": False, "DogsAllowed": False, "DriveThru": False, "GoodForDancing": False, "GoodForKids": False, "GoodForMeal": {"breakfast": False, "brunch": False, "dessert": False, "dinner": False, "latenight": False, "lunch": False}, "HappyHour": True, "HasTV": True, "Music": None, "NoiseLevel": "average", "Open24Hours": False, "OutdoorSeating": True, "RestaurantsAttire": "casual", "RestaurantsCounterService": False, "RestaurantsDelivery": False, "RestaurantsGoodForGroups": True, "RestaurantsPriceRange": 2, "RestaurantsReservations": False, "RestaurantsTableService": True, "RestaurantsTakeOut": True, "Smoking": "no", "WheelchairAccessible": True, "WiFi": "free"};6iYb2HFDywm3zjuRg0shjw;["Gastropubs", "Food", "Beer Gardens", "Restaurants", "Bars", "American (Traditional)", "Beer Bar", "Nightlife", "Breweries"];Boulder;["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"];40.0175444;-105.2833481;Oskar Blues Taproom;86;4.0;CO
{"AcceptsInsurance": False, "AgesAllowed": "allages", "Alcohol": "beer_and_wine", "Ambience": {"casual": True, "classy": False, "divey": False, "hipster": False, "intimate": False, "romantic": False, "touristy": False, "trendy": False, "upscale": False}, "BYOB": False, "BikeParking": False, "BusinessAcceptsBitcoin": False, "BusinessAcceptsCreditCards": True, "BusinessParking": {"garage": True, "lot": False, "street": False, "valet": False, "validated": False}, "ByAppointmentOnly": False, "Caters": True, "CoatCheck": False, "Corkage": False, "DogsAllowed": False, "DriveThru": False, "GoodForDancing": False, "GoodForKids": True, "GoodForMeal": {"breakfast": True, "brunch": False, "dessert": False, "dinner": False, "latenight": False, "lunch": True}, "HappyHour": False, "HasTV": False, "Music": None, "NoiseLevel": "average", "Open24Hours": False, "OutdoorSeating": False, "RestaurantsAttire": "casual", "RestaurantsCounterService": False, "RestaurantsDelivery": False, "RestaurantsGoodForGroups": False, "RestaurantsPriceRange": 2, "RestaurantsReservations": False, "RestaurantsTableService": True, "RestaurantsTakeOut": True, "Smoking": "no", "WheelchairAccessible": False, "WiFi": "free"};tCbdrRPZA0oiIYSmHG3J0w;["Salad", "Soup", "Sandwiches", "Delis", "Restaurants", "Cafes", "Vegetarian"];Portland;["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"];45.5889058992;-122.5933307507;Flying Elephants at PDX;126;4.0;OR
{"AcceptsInsurance": False, "AgesAllowed": "allages", "Alcohol": "none", "Ambience": None, "BYOB": False, "BikeParking": False, "BusinessAcceptsBitcoin": False, "BusinessAcceptsCreditCards": True, "BusinessParking": {"garage": False, "lot": False, "street": True, "valet": False, "validated": False}, "ByAppointmentOnly": False, "Caters": False, "CoatCheck": False, "Corkage": False, "DogsAllowed": False, "DriveThru": False, "GoodForDancing": False, "GoodForKids": False, "GoodForMeal": None, "HappyHour": False, "HasTV": False, "Music": None, "NoiseLevel": "average", "Open24Hours": False, "OutdoorSeating": False, "RestaurantsAttire": "casual", "RestaurantsCounterService": False, "RestaurantsDelivery": False, "RestaurantsGoodForGroups": False, "RestaurantsPriceRange": 2, "RestaurantsReservations": True, "RestaurantsTableService": True, "RestaurantsTakeOut": False, "Smoking": "no", "WheelchairAccessible": False, "WiFi": "no"};bvN78flM8NLprQ1a1y5dRg;["Antiques", "Fashion", "Used", "Vintage & Consignment", "Shopping", "Furniture Stores", "Home & Garden"];Portland;["Thursday", "Friday", "Saturday", "Sunday"];45.5119069956;-122.6136928797;The Reclaimory;13;4.5;OR
是否可以使用 AWS Glue 处理此文件以输入 AWS Athena / Hive(在 Athena 内部使用)?特别是,如何指定 JSON 列的数据类型?我必须手动执行此操作吗? JSON 写得好吗,还是应该重新格式化?
【问题讨论】:
标签: json csv hive aws-glue amazon-athena