【发布时间】:2013-12-22 16:53:33
【问题描述】:
当我看到ls以奇怪的顺序列出以下文件时,我一直对此感到困惑:
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
从人类的角度来看,“I”应该先走,然后是“II”,依此类推。
所以我创建了包含以下内容的文件:
$ cat 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The
如果我对它进行排序,它会给我这个:
$ sort 1
Star Wars Episode II - Attack
Star Wars Episode III - Revenge
Star Wars Episode I - The
Star Wars Episode IV - A
Star Wars Episode VI - Return
Star Wars Episode V - The
但是,如果我删除“-”以及排序正确后的所有内容:
$ cat 1
Star Wars Episode II
Star Wars Episode III
Star Wars Episode I
Star Wars Episode IV
Star Wars Episode VI
Star Wars Episode V
$ sort 1
Star Wars Episode I
Star Wars Episode II
Star Wars Episode III
Star Wars Episode IV
Star Wars Episode V
Star Wars Episode VI
所以,只要我在空格后添加任何符号,它就会开始对我来说不可预知的排序:
$ cat 1
Star Wars Episode II y
Star Wars Episode III x
Star Wars Episode I z
Star Wars Episode IV w
Star Wars Episode VI v
Star Wars Episode V u
$ sort 1
Star Wars Episode III x
Star Wars Episode II y
Star Wars Episode IV w
Star Wars Episode I z
Star Wars Episode VI v
Star Wars Episode V u
关于这种排序行为的任何提示?
更新:排序:使用‘en_CA.UTF-8’排序规则
update #2 根据下面的评论,这是因为语言环境。
ls | LANG=C sort
Star Wars Episode I - The Phantom Menace (1999) BDRip.mkv
Star Wars Episode II - Attack of the Clones (2002) BDRip.mkv
Star Wars Episode III - Revenge of the Sith (2005) BDRip.mkv
Star Wars Episode IV - A New Hope (1977) BDRip.mkv
Star Wars Episode V - The Empire Strikes Back (1980) BDRip.mkv
Star Wars Episode VI - Return of the Jedi (1983) BDRip.mkv
为什么 UTF8 语言环境让它与众不同? 我检查了 ru_RU.UTF8(排序错误)和 ru_RU.KOI8-R(正确排序)
更新#3关于语言环境:http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
【问题讨论】:
-
在
LC_ALL=C前面加上它可以工作,所以它必须与语言环境有关。 -
unix.com/showthread.php?t=156805 用罗马数字对文件进行排序的脚本
-
“ii”是 ru_RU 语言环境中排在“i”之前的二合字母(当它不被视为罗马数字时)?快速的 Google 显示已经报告了针对 ru_RU.UTF8 区域设置的排序顺序问题的错误,因此这完全有可能是您所看到的内容的一部分......
-
请看我下面的回答并更新到原始问题。这是 UTF8 语言环境的默认行为,至少是我使用过的语言环境。他们忽略空格。我最初的问题与 ru.RU.* 语言环境无关,而是与 *.UTF8 和 en_CA.UTF8 相关。