【问题标题】:How to subtract a column of string based on other columns in Hive?如何根据 Hive 中的其他列减去一列字符串?
【发布时间】:2024-03-09 08:20:02
【问题描述】:

使用此表,我正在尝试删除碰巧出现在 zip_code 和 city 中的部分地址。

+----------------------------------------------+----------+------------+
| address                                      | zip_code | city       |
+----------------------------------------------+----------+------------+
| Oceans Group, 12 Pear Tree Road, Derby       | DE23 6PY | Derby      |
| 970 Stockport Road                           | M19 3NN  | Manchester |
| Cartridge World Guiseley                     |          | Edinburgh  |
| 33-41 Kelvin Avenue                          | G52 4LT  | Glasgow    |
| Cartridge World Haymarket, 54 Dalry Road, UK | EH5 1HX  | Edinburgh  |
| 50 Otley Road, Leeds, LS20 8AH, UK           | LS20 8AH |            |
+----------------------------------------------+----------+------------+

类似

SUBSTR('Oceans Group, 12 Pear Tree Road, Derby', 'DE23 6PY', 'Derby') returns 'Oceans Group, 12 Pear Tree Road, '
SUBSTR('50 Otley Road, Leeds, LS20 8AH, UK', 'LS20 8AH', '') returns '50 Otley Road, Leeds, , UK'

希望这段代码能为您节省一些时间。

CREATE TABLE address_table(
      address    STRING
    , zip_code   STRING
    , city       STRING
);

INSERT INTO address_table VALUES ("Oceans Group, 12 Pear Tree Road, Derby", "DE23 6PY", "Derby");
INSERT INTO address_table VALUES ("970 Stockport Road", "M19 3NN", "Manchester");
INSERT INTO address_table VALUES ("Cartridge World Guiseley", "", "Edinburgh");
INSERT INTO address_table VALUES ("33-41 Kelvin Avenue", "G52 4LT", "Glasgow");
INSERT INTO address_table VALUES ("Cartridge World Haymarket, 54 Dalry Road, UK", "EH5 1HX", "Edinburgh");
INSERT INTO address_table VALUES ("50 Otley Road, Leeds, LS20 8AH, UK", "LS20 8AH", "");

【问题讨论】:

    标签: sql string select hive sql-update


    【解决方案1】:

    Hive 没有常规的字符串替换功能,但您可以使用 regexp_replace() 代替:

    select
        a.*,
        regexp_replace(address, zip_code, '') new_address
    from address_table
    

    如果您想要update 声明:

    update address_table
    set address = regexp_replace(address, zip_code, '')
    

    【讨论】: