【发布时间】:2015-09-17 20:53:12
【问题描述】:
我有一个带有两个标题行的 CSV 文件。我想删除它们。如何删除 hive 或 PIG 中 CSV 文件的前两行?文件的前几行如下:
YEAR QUARTER MONTH DAY_OF_MONTH DAY_OF_WEEK FL_DATE UNIQUE_CARRIER AIRLINE_ID CARRIER TAIL_NUM FL_NUM ORIGIN ORIGIN_CITY_NAME ORIGIN_STATE_ABR ORIGIN_STATE_FIPS ORIGIN_STATE_NM ORIGIN_WAC DEST DEST_CITY_NAME DEST_STATE_ABR DEST_STATE_FIPS DEST_STATE_NM DEST_WAC CRS_DEP_TIME DEP_TIME DEP_DELAY DEP_DELAY_NEW DEP_DEL15 DEP_DELAY_GROUP DEP_TIME_BLK TAXI_OUT WHEELS_OFF WHEELS_ON TAXI_IN CRS_ARR_TIME ARR_TIME ARR_DELAY ARR_DELAY_NEW ARR_DEL15 ARR_DELAY_GROUP ARR_TIME_BLK CANCELLED CANCELLATION_CODE DIVERTED CRS_ELAPSED_TIME ACTUAL_ELAPSED_TIME AIR_TIME FLIGHTS DISTANCE DISTANCE_GROUP CARRIER_DELAY WEATHER_DELAY NAS_DELAY SECURITY_DELAY LATE_AIRCRAFT_DELAY
YEAR QUARTER MONTH DAY_OF_MONTH DAY_OF_WEEK FL_DATE UNIQUE_CARRIER AIRLINE_ID CARRIER TAIL_NUM FL_NUM ORIGIN ORIGIN_CITY_NAME ORIGIN_STATE_ABR ORIGIN_STATE_FIPS ORIGIN_STATE_NM ORIGIN_WAC DEST DEST_CITY_NAME DEST_STATE_ABR DEST_STATE_FIPS DEST_STATE_NM DEST_WAC CRS_DEP_TIME DEP_TIME DEP_DELAY DEP_DELAY_NEW DEP_DEL15 DEP_DELAY_GROUP DEP_TIME_BLK TAXI_OUT WHEELS_OFF WHEELS_ON TAXI_IN CRS_ARR_TIME ARR_TIME ARR_DELAY ARR_DELAY_NEW ARR_DEL15 ARR_DELAY_GROUP ARR_TIME_BLK CANCELLED CANCELLATION_CODE DIVERTED CRS_ELAPSED_TIME ACTUAL_ELAPSED_TIME AIR_TIME FLIGHTS DISTANCE DISTANCE_GROUP CARRIER_DELAY WEATHER_DELAY NAS_DELAY SECURITY_DELAY LATE_AIRCRAFT_DELAY
2015 1 1 1 4 2015-01-01 AA 19805 AA N787AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 6 California 91 900 855 -5 0 0 -1 0900-0959 17 912 1230 7 1230 1237 7 7 0 0 1200-1259 0 0 390 402 378 1 2475 10
2015 1 1 2 5 2015-01-02 AA 19805 AA N795AA 1 JFK New York NY NY 36 New York 22 LAX Los Angeles CA CA 6 California 91 900 850 -10 0 0 -1 0900-0959 15 905 1202 9 1230 1211 -19 0 0 -2 1200-1259 0 0 390 381 357 1 2475 10
【问题讨论】:
-
你能简单地删除所有以“YEAR”开头的行吗?
-
非常感谢您的回答...您能提供一下代码吗?是在 PIG 还是 HIVE 中?
-
stackoverflow.com/questions/17810537/… 搜索“PIG 从表中删除行”
标签: file hive apache-pig bigdata