【问题标题】:How to split a string using regex?如何使用正则表达式拆分字符串?
【发布时间】:2022-01-15 05:24:44
【问题描述】:

我正在尝试用每个“_”字符拆分 ad_content,但我不知道为什么我不能超过第 9 个拆分词 (splits[SAFE_OFFSET(8)] AS objective)。

这是我正在使用的查询:

SELECT
    ad_content,
    splits[SAFE_OFFSET(0)] AS country,
    splits[SAFE_OFFSET(1)] AS product,
    splits[SAFE_OFFSET(2)] AS budget,
    splits[SAFE_OFFSET(3)] AS source,
    splits[SAFE_OFFSET(4)] AS campaign,
    splits[SAFE_OFFSET(5)] AS audience,
    splits[SAFE_OFFSET(6)] AS route_type,
    splits[SAFE_OFFSET(7)] AS business,
    splits[SAFE_OFFSET(8)] AS objective,
    splits[SAFE_OFFSET(9)] AS format,
    splits[SAFE_OFFSET(10)] AS nnn,
    splits[SAFE_OFFSET(11)] AS date,
FROM (
  SELECT
    AD_CONTENT,
    SPLIT(REGEXP_REPLACE(
            AD_CONTENT,
            r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
            r'\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12'),
          '|') AS splits
  FROM ga_digital_marketing

例如,ad_content = us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906

这是使用上面查询的结果:

ad_content country product budget source campaign audience route_type business objective format nnn date
us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906 us latam perf facebook black-friday bbdd-push SCL-CCP domestic conversion us0 us1 us2

正如您在上面看到的,格式列 (splits[SAFE_OFFSET(9)] AS format) 没有正确给出结果。

我相信问题出在:r'\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12') 因为可能 |\10 的数字 0 没有将其识别为数字而是字符串。这就是为什么我有 us0 us1us2

有解决这个限制的办法吗?

还有其他方法可以拆分 ad_content 示例吗?

【问题讨论】:

  • 对,\n 反向引用语法通常支持从 1 到 9 的组。试试$10$11$12
  • @WiktorStribiżew 感谢您的评论。但它没有奏效。控制台向我显示:“无效的 REGEXP_REPLACE 模式:重写架构错误:'\' 必须后跟一个数字或 '\'”

标签: sql regex google-bigquery regexp-replace


【解决方案1】:

BigQuery 的 REGEXP_REPLACE 仅支持 \1 到 \9 - 这就是原因!

有解决这个限制的办法吗?

改用下面的方法

SELECT
    -- ad_content,
    splits[SAFE_OFFSET(0)] AS country,
    splits[SAFE_OFFSET(1)] AS product,
    splits[SAFE_OFFSET(2)] AS budget,
    splits[SAFE_OFFSET(3)] AS source,
    splits[SAFE_OFFSET(4)] AS campaign,
    splits[SAFE_OFFSET(5)] AS audience,
    splits[SAFE_OFFSET(6)] AS route_type,
    splits[SAFE_OFFSET(7)] AS business,
    splits[SAFE_OFFSET(8)] AS objective,
    splits[SAFE_OFFSET(9)] AS format,
    splits[SAFE_OFFSET(10)] AS nnn,
    splits[SAFE_OFFSET(11)] AS date,
FROM (
  SELECT
    AD_CONTENT,
    SPLIT(AD_CONTENT, '_') AS splits
  FROM ga_digital_marketing
)    

如果应用于您的问题中的样本 - 输出是

【讨论】:

  • 很高兴它对你有用。也考虑投票赞成答案:o)
猜你喜欢
  • 2011-12-28
  • 2017-02-23
相关资源
最近更新 更多