【发布时间】:2017-09-18 13:46:36
【问题描述】:
我有来自不同来源的客户和潜在客户,我需要确定客户是否已注册为潜在客户。
我使用 12 个字段进行匹配:
address1_clear
address2_clear
address_clear
contact_name_clear
email
invoice_mobile
invoice_phone
mobile
name_clear
phone
phone2
taxnum
(_clear 后缀表示数据为小写,无空格和标点符号)。
- 潜在客户 - 30 万条记录
- 客户 - 50 万条记录
- customers_leads - 46 万条记录
这是用于执行匹配的查询:
SELECT l.id as lead_id, c.id as customer_id FROM lead l
INNER JOIN sync_settings s ON s.account_id = l.account_id
INNER JOIN customers c ON c.setting_id = s.id
LEFT JOIN customers_leads cl ON cl.customer_id = c.id AND cl.lead_id = l.id
WHERE cl.lead_id IS NULL AND
(
(l.phone IS NOT NULL AND l.phone IN (c.phone, c.phone2, c.invoice_phone, c.invoice_mobile)) OR
(l.mobile IS NOT NULL AND l.mobile != "" AND l.mobile IN (c.phone, c.phone2, c.invoice_phone, c.invoice_mobile)) OR
(l.invoice_phone IS NOT NULL AND l.invoice_phone != "" AND l.invoice_phone IN (c.phone, c.phone2, c.invoice_phone, c.invoice_mobile)) OR
(l.invoice_mobile IS NOT NULL AND l.invoice_mobile != "" AND l.invoice_mobile IN (c.phone, c.phone2, c.invoice_phone, c.invoice_mobile)) OR
(l.email IS NOT NULL AND l.email != "" AND l.email = c.email) OR
(l.taxnum IS NOT NULL AND l.taxnum != "" AND l.taxnum = c.taxnum) OR
(l.contact_name_clear IS NOT NULL AND l.contact_name_clear != "" AND l.contact_name_clear = c.contact_name_clear) OR
(l.address1_clear IS NOT NULL AND l.address1_clear != "" AND l.address1_clear = c.address_clear) OR
(l.address2_clear IS NOT NULL AND l.address2_clear != "" AND l.address2_clear = c.address_clear) OR
(l.name_clear IS NOT NULL AND l.name_clear != "" AND l.name_clear IN (c.contact_name_clear, c.name_clear))
)
它超级重,响应时间约为 4 分钟。由于 OR 和其他条件,索引并没有太大帮助。
我想知道:有更好的方法吗?也许使用一些 NoSQL 数据库来构建一个巨大的哈希表,或者是一些我无法谷歌的数据匹配技术?
P。 S. 我知道我可以单独为匹配字段制作单独的表格,这样会更快,但我仍然想知道我的替代方案。
【问题讨论】:
标签: mysql record-linkage nosql