【问题标题】:XML Hive Serde Extract timestamp HadoopXML Hive Serde 提取时间戳 Hadoop
【发布时间】:2025-11-22 12:15:01
【问题描述】:

我正在尝试使用 Hive 中的 xml serde 从 xml 中提取时间戳。外部表创建链接到 hdfs 目录。目前,时间戳值在我的表中显示为 null。

我认为时间戳需要强制转换?我不知道。其余的 xml 信息工作正常并显示在 hive 中。

输入文件是:

<example>
<date>2017-02-09 22:03:58<date>
</example>

Hive 创建脚本:

create external table example (
date timestamp
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.date"="/example/date/text()"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'mypath'
TBLPROPERTIES (
"xmlinput.start"="<example>",
"xmlinput.end"="</example>"
);

【问题讨论】:

    标签: xml hadoop hive timestamp hive-serde


    【解决方案1】:

    似乎只支持 Java 原始类型。
    查看XmlUtils.java 文件中的getPrimitiveValue 方法。

    /**
     * (c) Copyright IBM Corp. 2013. All rights reserved.
     *
     * Licensed under the Apache License, Version 2.0 (the "License").
     * You may not use this file except in compliance with the License.
     * You may obtain a copy of the License at
     *
     *    http://www.apache.org/licenses/LICENSE-2.0
     *
     * Unless required by applicable law or agreed to in writing, software
     * distributed under the License is distributed on an "AS IS" BASIS,
     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     * See the License for the specific language governing permissions and
     * limitations under the License.
    */
    
    package com.ibm.spss.hive.serde2.xml.processor;
    
    import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector.PrimitiveCategory;
    
    /**
     * The XML utilities
     */
    public class XmlUtils {
    
        /**
         * Private constructor
         */
        private XmlUtils() {
        }
    
        /**
         * Converts the string value to the java object for the given primitive category
         * 
         * @param value
         *            the value
         * @param primitiveCategory
         *            the primitive category
         * @return the java object
         */
        public static Object getPrimitiveValue(String value, PrimitiveCategory primitiveCategory) {
            if (value != null) {
                try {
                    switch (primitiveCategory) {
                        case BOOLEAN:
                            return Boolean.valueOf(value);
                        case BYTE:
                            return Byte.valueOf(value);
                        case DOUBLE:
                            return Double.valueOf(value);
                        case FLOAT:
                            return Float.valueOf(value);
                        case INT:
                            return Integer.valueOf(value);
                        case LONG:
                            return Long.valueOf(value);
                        case SHORT:
                            return Short.valueOf(value);
                        case STRING:
                            return value;
                        default:
                            throw new IllegalStateException(primitiveCategory.toString());
                    }
                } catch (Exception ignored) {
                }
            }
            return null;
        }
    
    }
    

    【讨论】: