【问题标题】:Explain combination function of python module itertools解释python模块itertools的组合功能
【发布时间】:2014-07-23 10:23:23
【问题描述】:

我经常在 Python 中使用itertools 模块,但如果我不知道其背后的逻辑,那感觉就像在作弊。

这是在顺序不重要时查找字符串组合的代码。

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

有人可以解释一下基本概念吗?尤其是第 14 行

【问题讨论】:

  • 第 14 行是一个 else 语句。你指的是哪一行?
  • 您说“这是我的代码”,但它是此处示例的直接副本:docs.python.org/2/library/itertools.html#itertools.combinations
  • @TomDalton 显然,如果他不知道它是如何工作的,那么这不是他的代码。他的意思是“这是这个问题所基于的代码”。

标签: python combinations itertools


【解决方案1】:
def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    # first you create a tuple of the original input which you can refer later with 
    # the corresponding indices
    n = len(pool)
    # get the length of the tuple
    if r > n:
        return
    # if the length of the desired permutation is higher than the length of the tuple 
    # it is not possible to create permutations so return without doing something

    indices = list(range(r))
    # create the first list of indices in normal order ( indices = [0,1,2,3,...,r])
    # up to the desired range r

    yield tuple(pool[i] for i in indices)
    # return the first permutation which is a tuple of the input with the original 
    # indices up to r tuple(tuple[0], tuple[1],....,tuple[r])

    while True:
        for i in reversed(range(r)):
            # i will go from r-1, r-2, r-3, ....,0

            if indices[i] != i + n - r:
                # if condition is true except for the case 
                # that at the position i in the tuple the last possible 
                # character appears then it is equal and proceed with the character 
                # before which means that this character is replaced by the next 
                # possible one

                # example: tuple='ABCDE' so n = 5, r=3 indices is [0,1,2] at start i=2
                # yield (A,B,C)
                # indices[i] is 2 and checks if 2 != 4 (2 +5-3) is true and break
                # increase indices[i]+1 and yield (A,B,D)
                # indices[i] is 3 and checks if 3 != 4 (2 +5-3) is true and break
                # increase indices[i]+1 and yield (A,B,E) 
                # indices[i] is 4 and checks if 4 != 4 (2 +5-3) is false so next loop 
                # iteration:  i = 1 indices[i] is 1 and checks if 4 != 3 (1 +5-3) 
                # is true and break .... and so on

                break
        else:
            # when the forloop completely finished then all possible character 
            # combinations are processed and the function ends
            return

        indices[i] += 1 # as written proceed with the next character which means the 
                        # index at i is increased
        for j in range(i+1, r): 
            indices[j] = indices[j-1] + 1 # all the following indexes are increased as 
                                          # well since we only want to at following 
                                          # characters and not at previous one or the
                                          # same which is index at indice[i]
        yield tuple(pool[i] for i in indices)
        # return the new tuple

【讨论】:

    【解决方案2】:
    def combinations(iterable, r):
        # first, we need to understand, this function is to record every possibility of indices
        # then return the elements with the indices
    
        pool = tuple(iterable)
    
        n = len(pool)
    
        if r > n:
            return
        indices = list(range(r))
    
        # yield the first permutation, 
        # cause in the "while" circle, we will start to change the indices by plus 1 consistently
        # for example: iterable is [1, 2, 3, 4, 5], and r = 3
        # this yield will return [1, 2, 3], but in the "while" loop, 
        # we will start to update last elements' index to 4, which will return [1, 2, 4]
        yield tuple(pool[i] for i in indices)
    
        while True:
    
            # in this for loop, we want to confirm whether indices[i] can be increased or not
            for i in reversed(range(r)):
    
                # after reversed, i will be r-1, r-2, r-3, ....,0
                # something we should know before we start the 'for' loop
                # the value of indices[r-1] should not greater than n-1
                # the value of indices[r-2] should not greater than n-2
                # and the maximum of indices[i] should be indices[r-1]
                # so the value of indices[r-1] should between r-1 and n-r + r-1, like this:
                #       r-1 <= indics[r-1] <= n-r + r-1
                # so, to r-2:
                #       r-2 <= indics[r-1] <= n-r + r-2
                # let's set i = r-1:
                #       i <= indices[i] <= n-r+i (n-1 is the maximum value)
                # since we will keep plusing the value of indices[i], let's ignore i <= indices[i]
                # and we just want to know if indices[i] can plus or not,
                # so indices[i] can be equal with n-r+i
                # then we got:
                #       indices[i] < i + n - r
                # the offical document give: indices[i] != i + n - r,
                # cause when indices[i] == i + n - r, it arrived the boundry, 
                # the "for" loop will get into indices[i-1], there will be no judgement for ">i+n-r"
                # so to use indices[i] != i + n - r is still a good way, 
                # but i prefer indices[i] < i + n - r, which is easier to understand for me.
                # so next question is "break" in here, 
                # it means the value of indices[i] doesn't reach the boundry still can plus one,
                # let break out to continue the iteration
                # when it hit the boundry, i will be r-2
                # So we can see the result:
                # 1, 2, 3
                # 1, 2, 4
                # 1, 2, 5
                # 1, 3, 4
                # always loop the last index, hit the boundry, check the last but one.
                if indices[i] < i + n - r:
                    break
            else:
                # loop finished, return
                return
    
            # first of all, plus one for current indices[i], 
            # that's why we yield the first permutation, before the while loop
            # and increase every indices[i] by 1
            indices[i] = indices[i] + 1
            # this for loop is increase every indices which is after indices[i].
            # cause, current index increased, and we need to confirm every one behind is orderd
            # for example: current we got i = 2, indices[i]+1 will be 3, 
            # so the next loop will start with [1, 3, 4], not [1, 3, 3]
            for j in range(i+1, r):
                indices[j] = indices[j-1] + 1
    
            yield tuple(pool[i] for i in indices)
    

    【讨论】:

      【解决方案3】:

      注意!代码将被分解,并且相对于它的每个部分都没有正确缩进,因此我建议您也查看问题本身/itertools 文档(相同代码)中的代码。

      问这个问题已经 7 年多了。哇。我自己对此很感兴趣,上面的解释虽然很有帮助,但对我来说并没有真正切中要害,所以这是我为自己做的总结。
      由于我终于设法理解它(或者至少我认为我理解),我认为发布这个解释的“版本”可能是有益的,以防有更多像我一样的人。那么让我们开始吧。

      def combinations(iterable, r):
          pool = tuple(iterable)
          n = len(pool) 
      

      在第一部分中,简单地创建一个可迭代的元组并获取可迭代的长度。这些在以后会有用。

      if r > n:
              return
          indices = list(range(r))
          yield tuple(pool[i] for i in indices)
      

      这也很简单——如果所需组合的长度大于我们的元素池,我们无法构造一个有效的组合(你不能从 4 个元素中组合 5 个元素),因此我们只是停止执行带有退货声明。我们还生成了第一个组合(迭代中的前 r 个元素)。

      下一部分稍微复杂一些,请仔细阅读。

      while True:
          for i in reversed(range(r)):
              if indices[i] != n - (r - i):
                  break
      """
      The job of the while loop is to increment the indices one after
      the other, and print out all the possible element combinations based
      off all the possible valid indice combinations.
      
      This for loop's job is to make sure we never over-increment any values.
      
      In order for us to not run into any errors, the incremention of
      the last element of the indice list must stop when it reaches one-less 
      than the length of our element list, otherwise we'll run into an index error 
      (trying to access an indice out of the list range).
      How do we do that?
                  
      The range function will give us values cascading down from r-1 to 0
      (r-1, r-2, r-3, ... , 0)
      So first and foremost, the (r-1)st indice must not be greater than (n-1)
      (remember, n is the length of our element pool), as that is the largest indice. 
      We can then write
      
      Indices[r - 1] < n - 1
      
      Moreover, because we'll stop incrementing the r-1st indice when we reach it's
      maximum value, we must also stop incrementing the (r-2)nd indice when we reach
      it's maximum value. What's the (r-2)nd indice maximum value?
      
      Since we'll also be incrementing the (r-1)st indice based on the 
      (r-2)nd indice, and because the maximum value of the (r-1)st 
      indice is (n-1), plus we want no duplicates, the maximum value the
      (r-2)nd indice can reach would be (n-2).
      This trend will continue. more generally:
                  
      Indices[r - k] < n - k
      
      Now, since r - k is an arbitrary index generated by the reversed range function, 
      namely (i), we can substitute:
      
      r - k = i -----> k = r - i
      Indices[r - k] < n - k -----> Indices[i] < n - (r - i)
                  
      That's our limit - for any indice i we generate, we must break the 
      increment if this inequality { Indices[i] < n - (r - i) } is no longer 
      true.
      (In the documentation it's written as (Indice[i] != i + n - r), and it 
      means the exact same thing. I simply find this version easier to visualize 
      and understand).
      """
      else:
          return
      """
      When our for loop runs out - which means we've gone through and 
      maximized each element in our indice list - we've gone through every 
      single combination, and we can exit the function.
      
      It's important to distinct that the else statement is not linked to 
      the if statement in this case, but rather to the for loop. It's a 
      for-else statement, meaning "If you've finished iterating through the 
      entire loop, execute the else statement".
      """
      

      如果我们确实设法跳出 for 循环,这意味着我们可以安全地增加索引以获得下一个组合(下面的第一行)。 下面的 for 循环确保每次我们从一个新索引开始, 我们将其他索引重置为可能的最小值, 以免错过任何组合。

      例如,如果我们不这样做,那么一旦我们到达一个点 我们必须继续前进,比如说我们有 (0, 1, 2, 3, 4) 和组合 索引是 (0, 1, 4),当我们继续并将 1 增加到 2 时,最后一个 索引将保持不变 - 4,我们将错过 (0, 2, 3),只有 将 (0, 2, 4) 注册为有效组合。相反,在我们递增之后 (1 -> 2),我们基于以下更新后面的索引:(4 -> 3),当我们 再次运行 while 循环,我们会将 3 增加到 4(请参阅 到上一节)。

      请注意,我们从不增加以前的索引,因为不创建 重复。

      最后,对于每次迭代,yield 语句都会生成与当前索引组合相对应的元素组合。

      indices[i] += 1
      for j in range(i+1, r):
          indices[j] = indices[j-1] + 1
      yield tuple(pool[i] for i in indices)
      

      正如文档所述,因为我们正在处理位置,所以唯一的组合是唯一的,基于元素在可迭代对象中的位置,而不是它们的值。

      【讨论】:

        猜你喜欢
        • 2020-05-05
        • 1970-01-01
        • 2015-09-08
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-01-07
        相关资源
        最近更新 更多