【问题标题】:Count consecutive numbers in Python在Python中计算连续数字
【发布时间】:2021-12-04 20:47:19
【问题描述】:

我不明白为什么我的方法不起作用。 我有一个文本文件,其中包含一长串 DNA(例如 ATTTGTTCAGCCA 可以看到更远的地方)。我的任务是计算文件中连续出现“AGATC”的次数。 我的想法是将文件中的每个 AGATC 替换为“1”,然后计算连续的 1。如果 x[h] 本身是 1 并且邻居是 1(第二个 if 循环),或者如果 x[h] 不是 1 但邻居是 1(第一个 if 循环),我会进行计数(这应该可以工作,因为我没有用这个算法计算最后一个)。但是,这不适用于所有文件,例如文件5.文件内容再往下。

我的 AGATC 输出是 28,但应该是 22。我的计数方法有误吗? 非常感谢!

from sys import argv
import csv

# make methode which replaces the argument with 1 and then counts consecutive 1s
def count(seq):
    #replace with 1 and store in x
    x = sequence.replace(seq, "1")
    # count consecutive 1s
    count = 0
    h = 0
    for i in x:
        while h < len(x) - 1:
            if x[h] != "1" and x[h+1] == "1":
                count += 1
            if x[h] == "1":
                if (x[h+1]) == "1":
                    count += 1
            h += 1

    return count

# call count method to count the AGATC, AATG and TATC sequences in 5.txt
list = [count("AGATC"), count("AATG"), count("TATC")]
print(list)

   

GCTAAATTTGTTCAGCCAGATGTAGGCTTACAAATCAAGCTGTCCGCTCGGCACGGCCTACACACGTCGTGTAACTACAACAGCTAGTTAATCTGGATATCACCATGACCGAATCATAGATTTCGCCTTAAGGAGCTTTACCATGGCTTGGGATCCAATACTAAGGGCTCGACCTAGGCGAATGAGTTTCAGGTTGGCAATCAGCAACGCTCGCCATCCGGACGACGGCTTACAGTTAGTAGCATAGTACGCGATTTTCGGGAAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATGTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCCGTCAACTCATTCACACCGCATCCTTTCCTGCCACTGTAACTAGTCGACTGGGGAACCTCATCATCCATACTCTCCCACATTATGCCTCCCAACCTTGTTAAGCGTGGCATGCTTGGGATTGCATTGATGCTTCTTGGAGAGGACGCTTTCGTTTTGGAGATTACAGGGATCCAATTTTATCATCGGTTCGACTCCCGTAACGACTTAGCAGTAAGGGTGCTAGTTCCTGGTTAGAATCTTAATAAATCACGTCGCTTGGAGCAAGACAAAGATCGTCGTAATGCCAAGTGCACGACCACCTTCAGACTTGCAGGACCCGTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTT TTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTTTTTTTCTCGATAGCTATGCGGTTCAATACAATCTTAACGCAATGCAGCGATGTGGTTTCGTACACTTAGCATAAAACCCCCCACATTAAATCGATGTACCCGCCCTCTTAGACGCCAATTTCAATGCCGAACCTCCGGCGGGTATCTCTGCACTAGGAGAAGTAGCACGTCGCTGTAGCGAACTCCTATCGTGAGATAATTTGTAGAGCTGCTCTTATAATACAATAGCTCAGATGGATTATTCCATGGACATCCCCGTGCGTTGTTTCGAGGATGGTAGGTGGAAATTTTGCCAGACCTCTAGTCTTAAACATGGTTGACGTTATAGGCGCTATCTCTTGCGTCTGGAAGTGTTAATCCGTGAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAACACGCAACTCTGGAGGAGGGCACTGCACTGCAAACTTGCGTAATATCCTTCACCCACACTTGCCTGGCCTCCTTGCTTAAAGCTCTGGCGATGCGATTTTTCGGCCCAGTAGCTGAATAGGTCATGAAATGGGCACCGAACTGGAAAGACCCATATATTCGATACTCACAACTTAATGATAGCGCGATTAAGAGCGACACCAAAAACCAAATTACGTTCACGAACCTTTGAGAGTCAAGGAGACTTAGACCGAATTGAATGATCACTGATGCGCCCGCTGATACTGAGCCTCACCATTAATCGCCGACCAATACGGCGTGTACCGGGCGCGGCCTTGCCGCATAACGATAGATAGATAGA TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATATCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTACACAGCCCCGTCCTCATTGCTAAGTGCACTGGCAACTGGACCTAAAGATTTTTCGAGTATGGCCCTCGAATCAAGCGCCCACCCAGAAACCTACGAGCCAGTAACCCCAGTAAACAAGCATTAGTGCTATATGCTTGCTGCCCACTAGGACCCTTATGGTTCATACCAGGGTGACGTGTCTTGCGGGCCAAGGATGAACCAGAAGCAAGATCCTTAGATGGACGACTGTCTCATTGCTTAAACTCCACATACCAAAGGGCGCGGTAAACGATAGTTTTAGGTAATGTTAGTCGGATGGTTGTCTGCAGCTACCAATACAGCCTGGCACCCAGGGTCTGAACAATAACGCGTGAGAGCAGCTCTCCCGCGTGTGGTGGATTTGCCGTCTATGAAATTGAGGCTCTTGCAACTATTCGCACTCGGAATGCCCTCATATCTGGTGCCTAGCGGCCTTTGCCCCGTGCCGGTAGGACTAAACTCTACGGATCGTTGACGGATCTCGATGTGGAAGATGGTTATGAAAGATAACAACGCGTGTGCTAATTGATTTAGACAAGTATTGCGGCAGTAAAAGATAATCGGCTGCAGAGTTACGAAAGACTTCCATGCATGGATTCCATTCCTTCTAGTATAGGACCCACTCTGAATACACGTCTTGCGGGCCGATCATCTCCACCGCTGCGGAAGAAAGCAATTAAGAATCTATGCTCATTAAGAGTGCGACTATAATGCGGATCTTACAGT GCTAATGATCAGGACGTCGTCCAAGCAGGCTGCATGCCGAATTTAGCTTACGTCAGGATCAGGCGTTATAGCCTGGGAATCGGACTATGAGGACGCCACGACCTCTGGGAGAAAGCTATATACATTGAGGATCGCGCCATCTTTATGAGACTCAAATGAATCTAGATAGGTAGCATTGCGGACTTGAGTTAGCACATCGGTATTGGAAGGTGAGGGTCCTGCCGCTCGTTCTATGTTCGGTTTATAGTATACAAATAGGTCATCCCGAACGTTGAAGTTAAACTCATGACACGTTGTCGTAATGAAACGGGCCTGTTATTAGGGATACAGACAAAAGGCACAAGCTGGCTTGCACATTAAGGCGCACTAGAGATCCTCACAACCGTTGCCCGCACGGAGGTCGTGTCTAACAGACAGTGAACCAGCCGTATTGGGGTGGATGACCTGAGCTTCTTGGGGCCTGTTGTACACCGCGTGTGGTTCAACTGGTACACATACTACGAATATTCGAAATCATTGTACTGTGCTCTTCGGTGCTACTGACTGTGAGCGAATGCATCCCAATCCCAAACAATGCTTGTGGTAGGAGAATTGAAACTCTCGAAGCCTGGCCCAATGTCATCTACTTTTAACATGTCGGGCCAGGAGTTACGGGCATTGCTTACTTACTTTGCCCCCTTACACCACAGCAGCGCGATTCTTGTTGTAGTAGATTTTATACGACTCGCGAATTAAATGGAACTTGTCTGTCCCATATCGATCGTGTCCATCGTAAGATGAGATTGTAGGAGCATTCGGAAGTCTATGCGGCCCAGGGACTACTACGTTAAATCTGGTCAGACGTGGTTTACAAGGCGTCCCGATCTTCTCAGAACATATGGGAAAGCACTACCGTTCCTTCACGCATACAGTTGTTCGTGCCGAACGAGTAAGCTTGCGACCAGCCCACCCGCTAGGGCTATGCAGCGGGTCATGGCTGGCGCCATACTGTGCGGACAAC CCACGCTCTGGCAGAAAGCGTCTTGTGTTTTGTAGTAGCTCCAACGGTTAGACCTTCGATATCTATTCAGAGCGCGAGCGACCACTATTAGACGGCATGTAAACAATGTGTATTTGTTCGGCCCAACCGGTATATGGGTAAGACCGCGAAGGGCCTGCGCGAATACCAGCGTCCAAAAATTCCTCACCCGAGATATGCGGTTAGTACCCCTTGGGTAACGGTCCGCTACGGGTAGCGACGCGAGCCGGCCGCATCGGTTGGAGCCGAGTTGTCGGGCAGGCGAGTAACGTGTGCAATTTGATGGGCCCAAGCCTCCGGCACTATCCACCTCATACATCGACAAAAGCACCAAATATGGGGAAAAGCTGAGCGTCGATATGTACATCTACCCAGGAACCGGCCCGAACATTAGGCGGACGTGAATTTCCGACCTAGGTTCGGCTACATTTCTACGATCCAAGCACACGTGAAGGAGGAGGGGTGTTCCGACCGTAAATGAACGAGGTGCGCAGTGACCCGATGGCGTTTAGCGGATAGCCTTCCTATGCCGGCCTATGCTGTATGGTAGTTGGTTGGTGCCTCCAGAGCCACTGCACCCAATCATAGGGTCTACAGCAGCGTACTTATAAAATTGTACGGGTGACCCATATCCATTACGGGTTGCGACCAGTATAGGAGAGTATAACTGCGTGAACTAATGCGTTATGACGCTTCAGAGTTTGCTCGGGCCCGAGTTCTAGGGCTATAATGTGTTAGGGCGCAAGTATGCCAAGCTAAGATGTGGCGTGCACACTAGGAGTTGTGTTCCTCTGCAAGCAGACACGAGCACTCTGGCAGTAGTTTGACCACACCCGGGTATCACTGCTACTCCATTTCGAACAAGCTATTGGAGCGGACAAAATATGCTACTCAAGAGCATTAGTTATAGGTCTACGAGACAGAAGCAGTTACTGAGTCTGAATATTCGATATAAGTAGGCATGGAGGCGGAGCAAAACAAC GTCTGCGATCAATCGTGTTGATGACGTATGGCGACTGGAAGGTAAGGACTATGGCCGGACGGAATGATTCATGTTCTGTTCAAAGCTATATTTCGAAGGGGTATATTAGCGGTCCTACACTTGGTTAGCACCCTCCCCCCTCTGGATCCTGCACTAATTCGAGCTGGCCTCCATCGGTATCAGTCCGGAAGCTCCACTCTCTATCGTAGTCCTAATCAACAGGGTGCCAGTTTGCTCACGTGGAAGTTTGAGGCCCTTTGTGCTCCATAGCCAATCACTAACCATGCACGCGCGACCCACTCTACGTCCAGATCGGCTATAATAGTTGCGCCCGGGACTGGCAGAGTAGACATGTAAGCTAGATAGAGCCCCGACATCGGCCAAGAGATCCTACGCTGCTTCCAGATAATGAGAGACATTCTAGCATTAGACATGCAAGTCGGCAGGGACTCCCCTTATCTAGTAATTTCGATGAATTGGTTTTTCGGCTAGCATCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGTCTAGACCATGCCGACCTCATCATAGAAGGAATGCTCTAAACTTAGAGTGCTACTAGGAAAACTATTAATCAATGATCGTCCTGCTTACATAGCTGGACGGCGAAAGTTCTTATACTGCGGAGGTTGCTGACGTAGAGTGCGCTGGGTACAGCGGATAAGTTGATCAGGGTGGGGATAGGGTGGCTCACCGTTTATACTCATATAGATTCCTGGCGTCGACGCTGTGACAGGGTCGAGATCGAGGGGGAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGATCAGCGGAGCGGAGGGAAAATTATCACCAGAGGGTAGGGGCTCGCGACATTCTATTCAATGCATTTCAAGCTACTTACGTATTTCGGCACAGTGA CTACTGCCTGCGCGGCAGCCGTAAGGTTTCCCGTCAATAGGTGGCACGTATCATTGATGAAAGTGTCAGCTAATCATTCAGGCCTTA

【问题讨论】:

  • sequence是如何初始化的?另外仅供参考,不要将 count 用于函数名和局部变量。
  • 也不要使用list作为变量名:) Shadows内置名称'list'

标签: python string count


【解决方案1】:

我可以为您提供另一个复杂度较小的解决方案:

yourString = "GCTAAATTTGTTCAGCCAGATGTAGGCTTACA....".replace("AGATC", "1")
count = 0
finalCount = 0
for character in yourString:
    if character == "1":
        count += 1
        if finalCount <= count:
            finalCount = count
    else:
        count = 0
print(finalCount)

这将在每次遇到1 时进行迭代。它还将添加到finalCount 中找到的最高计数值。最后,您将获得要搜索的最高序列

【讨论】:

  • 啊,现在我明白了我的错误,我非常专注于计算邻居,但我没有存储最高值。当我得到 1 时,我才数数。谢谢!
【解决方案2】:
  1. 如果 count 是函数名,不要使用 count 作为变量名
  2. 看起来您正在使用来自外部范围的变量序列。最好也将其作为参数传递。
  3. 我认为连续计数是指最长连续计数?在这种情况下,您需要添加一个变量来存储最长的连续计数,并在链结束时将您的计数器设置为 0。然后,当它继续时再次开始计数。

【讨论】:

  • 哦,我明白了,我得到了 28 个,因为字符串中总共有 28 个 AGATC。但只有22个连续。确实,我忘记存储最高值了!!非常感谢!这真的很有帮助!不敢相信我花了这么长时间哈哈。关于变量的命名你是对的,我以后应该考虑更多。
猜你喜欢
  • 2020-02-19
  • 1970-01-01
  • 1970-01-01
  • 2022-01-24
  • 2018-02-10
  • 1970-01-01
  • 2019-03-28
  • 1970-01-01
  • 2019-05-05
相关资源
最近更新 更多