【发布时间】:2022-01-15 05:24:44
【问题描述】:
我正在尝试用每个“_”字符拆分 ad_content,但我不知道为什么我不能超过第 9 个拆分词 (splits[SAFE_OFFSET(8)] AS objective)。
这是我正在使用的查询:
SELECT
ad_content,
splits[SAFE_OFFSET(0)] AS country,
splits[SAFE_OFFSET(1)] AS product,
splits[SAFE_OFFSET(2)] AS budget,
splits[SAFE_OFFSET(3)] AS source,
splits[SAFE_OFFSET(4)] AS campaign,
splits[SAFE_OFFSET(5)] AS audience,
splits[SAFE_OFFSET(6)] AS route_type,
splits[SAFE_OFFSET(7)] AS business,
splits[SAFE_OFFSET(8)] AS objective,
splits[SAFE_OFFSET(9)] AS format,
splits[SAFE_OFFSET(10)] AS nnn,
splits[SAFE_OFFSET(11)] AS date,
FROM (
SELECT
AD_CONTENT,
SPLIT(REGEXP_REPLACE(
AD_CONTENT,
r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
r'\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12'),
'|') AS splits
FROM ga_digital_marketing
例如,ad_content = us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906
这是使用上面查询的结果:
| ad_content | country | product | budget | source | campaign | audience | route_type | business | objective | format | nnn | date |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906 | us | latam | perf | black-friday | bbdd-push | SCL-CCP | domestic | conversion | us0 | us1 | us2 |
正如您在上面看到的,格式列 (splits[SAFE_OFFSET(9)] AS format) 没有正确给出结果。
我相信问题出在:r'\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11|\12') 因为可能 |\10 的数字 0 没有将其识别为数字而是字符串。这就是为什么我有 us0 us1 和 us2
有解决这个限制的办法吗?
还有其他方法可以拆分 ad_content 示例吗?
【问题讨论】:
-
对,
\n反向引用语法通常支持从 1 到 9 的组。试试$10、$11和$12。 -
@WiktorStribiżew 感谢您的评论。但它没有奏效。控制台向我显示:“无效的 REGEXP_REPLACE 模式:重写架构错误:'\' 必须后跟一个数字或 '\'”
标签: sql regex google-bigquery regexp-replace