【问题标题】:Change two characters into one symbol (Python)将两个字符变为一个符号(Python)
【发布时间】:2018-06-19 16:22:05
【问题描述】:

我目前正在为学校处理文件压缩任务,但我发现自己无法理解这段代码中发生了什么(更具体地说,什么没有发生以及为什么没有发生)。

所以在这部分代码中,我的目标是,在非编码术语中,将两个相同的相邻字母更改为一个符号,从而占用更少的内存:

          for i, word in enumerate(file_contents): 
           #file_contents = LIST of words in any given text file       

                word_contents = (file_contents[i]).split()
                for ind,letter in enumerate(word_contents[:-1]):
                    if word_contents[ind] == word_contents[ind+1]:
                         word_contents[ind] = ''
                         word_contents[ind+1] = '★'

但是,当我使用示例文本文件运行完整代码时,它似乎并没有按照我的要求执行。例如,“Sally”这个词应该是“Sa★y”,但保持不变。 谁能帮我走上正轨?

编辑:我错过了一个非常关键的细节。我希望压缩字符串以某种方式出现在包含双字母的原始 file_contents 列表中,因为完整压缩算法的目的是返回输入文件中文本的压缩版本。

【问题讨论】:

    标签: python python-3.x file compression


    【解决方案1】:

    我建议使用regex 匹配相同的相邻字符。

    示例

    import re
    
    txt = 'sally and bobby'
    print(re.sub(r"(.)\1", '*', txt))
    
    # sa*y and bo*y
    

    代码中的循环和条件检查不是必需的。请改用以下行:

    word_contents = re.sub(r"(.)\1", '*', word_contents)
    

    【讨论】:

      【解决方案2】:

      您的代码有一些问题(我认为)。

      1) split 产生一个列表而不是一个 str,所以当你说这个 enumerate(word_contents[:-1]) 看起来你假设你得到一个字符串?!?无论如何...我不确定是不是。

      然后!

      2) 用这一行:

      if word_contents[ind] == word_contents[ind+1]:
                         word_contents[ind] = ''
                         word_contents[ind+1] = '★'
      

      你又在你的名单上做手术了。看起来很明显您想要对字符串或正在处理的单词中的字符列表进行操作。最好的情况是这个函数什么都不做,最坏的情况是你破坏了单词内容列表。

      因此,当您执行修改时,您正在修改 word_contents 列表,而不是您实际查看的列表项 [:-1]。还有更多问题,但我认为这回答了你的问题(我希望)

      如果您真的想了解自己做错了什么,我建议您在您所做的事情中加入打印语句。如果您正在寻找某人为您做作业,我猜还有一个已经给您答案的人。

      这是一个示例,说明如何将日志记录添加到函数中

        for i, word in enumerate(file_contents): 
         #file_contents = LIST of words in any given text file       
      
              word_contents = (file_contents[i]).split()
              # See what the word content list actually is
              print(word_contents)
              # See what your slice is actually returning
              print(word_contents[:-1])
              # Unless you have something modifying your list elsewhere you probably want to iterate over the words list generally and not just the slice of it as well.
              for ind,letter in enumerate(word_contents[:-1]):
                  # See what your other test is testing
                  print(word_contents[ind], word_contents[ind+1])
                  # Here you probably actually want
                  # word_contents[:-1][ind]
                  # which is the list item you iterate over and then the actual string I suspect you get back
                  if word_contents[ind] == word_contents[ind+1]:
                       word_contents[ind] = ''
                       word_contents[ind+1] = '★'
      

      更新:根据来自 OP 的后续问题,我制作了一个带有说明的示例程序。请注意,这不是最佳解决方案,主要是在教授流程控制和使用基本结构方面进行练习。

      # define the initial data...
      file = "sally was a quick brown fox and jumped over the lazy dog which we'll call billy"
      file_contents = file.split()
      
      # Enumerate isn't needed in your example unless you intend to use the index later (example below)
      for list_index, word in enumerate(file_contents):
      
      # changing something you iterate over is dangerous and sometimes confusing like in your case you iterated over 
      # word contents and then modified it.  if you have to take
      # two characters you change the index and size of the structure making changes potentially invalid. So we'll create a new data structure to dump the results in
          compressed_word = []
      
          # since we have a list of strings we'll just iterate over each string (or word) individually
          for character in word:
              # Check to see if there is any data in the intermediate structure yet if not there are no duplicate chars yet
              if compressed_word:
                  # if there are chars in new structure, test to see if we hit same character twice 
                  if character == compressed_word[-1]:
                      # looks like we did, replace it with your star
                      compressed_word[-1] = "*"
                      # continue skips the rest of this iteration the loop
                      continue
              # if we haven't seen the character before or it is the first character just add it to the list
              compressed_word.append(character)
      
          # I guess this is one reason why you may want enumerate, to update the list with the new item?
          # join() is just converting the list back to a string
          file_contents[list_index] = "".join(compressed_word)
      
      # prints the new version of the original "file" string
      print(" ".join(file_contents))
      

      输出:"sa*y was a quick brown fox and jumped over the lazy dog which we'* ca* bi*y"

      【讨论】:

      • 我对自己做错了什么有一个模糊的理解,但如果您能再帮我一点,并给我一个关于如何让原始列表 file_contents 包含的示例,我将不胜感激word_contents 中的符号,我真的很感激它,因为我仍然是编程的初学者
      • 我试过在上面的代码之后直接声明file_contents[ind] = word_contents[ind](在注释掉打印语句之后)但是很明显这不是正确的方法。我想我知道该怎么做了,你能帮我看看语法吗?
      • 当然我不介意提供更多帮助,我遇到的问题是我不确定 file_contents 实际上是什么。我假设您已经有一个单词列表?
      • 是的 file_contents 是输入文件中的单词列表
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2015-12-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-08-28
      • 1970-01-01
      相关资源
      最近更新 更多