如何匹配电话号码前缀？答案

【问题标题】：How to match phone number prefixes?如何匹配电话号码前缀？
【发布时间】：2015-01-05 18:08:02
【问题描述】：

我正在使用需要与国家和运营商匹配的电话号码来分析数据。我收到了以下形式的电话号码前缀的国家和目的地（城市/运营商）映射：

Country, Destination, Country Code, Destination Code, Remarks
AAA, Some Mobile, 111, "12, 23, 34, 46",Some remarks
AAA, Some city A, 111, "55, 56, 57, 51", Some more remarks
BBB, Some city B, 222, "234, 345, 456", Other remarks

这里的数据是虚拟数据，但真实数据是相同的形式。 “目的地代码”列中有很多值。所以我想把这个文件转换成适合在数据库中使用的形式。

我想到的是把它转换成如下所示的形式：

Country, Destination, Combined Code, Remarks
AAA, Some Mobile, 11112, Some remarks
AAA, Some Mobile, 11123, Some remarks
AAA, Some Mobile, 11134, Some remarks
AAA, Some Mobile, 11146, Some remarks
etc..

这将使我能够创建一个更简单的映射表。处理此类数据的最佳方法是什么？我将如何在 bash shell 脚本或 python 中为这种转换编写代码？

【问题讨论】：

标签： python database shell fileparsing fileparse

【解决方案1】：

>>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],
... ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],
... ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],
... ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]
>>> 
>>> op=[data[0]]
>>> for i in data[1:]:
...    for j in i.pop(3).split(','):
...       op.append([k+j.strip() if i.index(k)==2 else k for k in i])
... 

>>> for i in op:
...    print i
... 
['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']
['AAA', 'Some Mobile', '11112', 'Some remarks']
['AAA', 'Some Mobile', '11123', 'Some remarks']
['AAA', 'Some Mobile', '11134', 'Some remarks']
['AAA', 'Some Mobile', '11146', 'Some remarks']
['AAA', 'Some city A', '11155', 'Some more remarks']
['AAA', 'Some city A', '11156', 'Some more remarks']
['AAA', 'Some city A', '11157', 'Some more remarks']
['AAA', 'Some city A', '11151', 'Some more remarks']
['BBB', 'Some city B', '222234', 'Other remarks']
['BBB', 'Some city B', '222345', 'Other remarks']
['BBB', 'Some city B', '222456', 'Other remarks']

更新后问题的解决方案：

>>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],
...  ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],
...  ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],
...  ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]
>>>  
>>> op=[data[0]]
>>> for i in data[1:]:
...    for id,j in enumerate(i.pop(3).split(',')):
...       k=i[:]
...       k.insert(3,i[2]+j.strip())
...       op.append(k)
... 
>>> for i in op:
...    print i
... 
['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']
['AAA', 'Some Mobile', '111', '11112', 'Some remarks']
['AAA', 'Some Mobile', '111', '11123', 'Some remarks']
['AAA', 'Some Mobile', '111', '11134', 'Some remarks']
['AAA', 'Some Mobile', '111', '11146', 'Some remarks']
['AAA', 'Some city A', '111', '11155', 'Some more remarks']
['AAA', 'Some city A', '111', '11156', 'Some more remarks']
['AAA', 'Some city A', '111', '11157', 'Some more remarks']
['AAA', 'Some city A', '111', '11151', 'Some more remarks']
['BBB', 'Some city B', '222', '222234', 'Other remarks']
['BBB', 'Some city B', '222', '222345', 'Other remarks']
['BBB', 'Some city B', '222', '222456', 'Other remarks']

【讨论】：

谢谢，这真的很酷，而且有效。只是多了一个要求。如何保持第三行中的国家代码原样并在第四列中显示“组合代码”？ Like ['BBB', 'Some city B', '222', '222456', '其他备注']。
@sfactor，我已根据您的要求更新了我的答案。希望它能解决问题，如果解决了，请接受答案。