【发布时间】:2015-08-06 08:14:20
【问题描述】:
我正在用 perl 编写脚本。但被困在一个部分。以下是我的 csv 文件示例。
"MP","918120197922","20150806125001","prepaid","prepaid","3G","2G"
"GJ","919904303790","20150806125002","prepaid","prepaid","2G","3G"
"MH","919921990805","20150806125003","prepaid","prepaid","2G"
"MP","918120197922","20150806125004","prepaid","prepaid","2G"
"MUM","919904303790","20150806125005","prepaid","prepaid","2G","3G"
"MUM","918652624178","20150806125005","","prepaid","","2G","NEW"
"MP","918120197922","20150806125005","prepaid","prepaid","2G","3G"
现在我需要根据第 2 列(即手机号码)获取唯一记录,但只考虑第 3 列的最新值(即时间戳) 例如:手机号码“918120197922”。
"MP","918120197922","20150806125001","prepaid","prepaid","3G","2G"
"MP","918120197922","20150806125004","prepaid","prepaid","2G"
"MP","918120197922","20150806125005","prepaid","prepaid","2G","3G"
它应该选择第三条记录,因为它具有最新的时间戳值(20150806125005)。请帮忙。
附加信息: 很抱歉数据不一致..我现在已经纠正了。 是的,数据是有序的,这意味着最新的时间戳将出现在最新的行中。 我的文件大小超过 1 gb 的另一件事是有什么方法可以有效地做到这一点?在这种情况下,awk 的工作速度会比 perl 快。请帮忙?
【问题讨论】:
-
如果你做了任何事情来实现这一点。请告诉我们。那会很棒。
-
您的文件是否已订购?最新的时间戳总是低于最近的时间戳吗?
-
在您的示例数据中,我没有看到手机号码
918120197922的三个MP记录。您的措辞不一致,并且与提供的数据相矛盾。请提供一致的MCVE。