【问题标题】:Counting the number of repeated entries in a file计算文件中重复条目的数量
【发布时间】:2020-05-30 20:15:47
【问题描述】:

当我读取一个文件时,它会给我这样的输出:

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749

我想统计文件中 CW 和 CS 的总数。输出应如下所示:

3 #For CW 
3 #For CS

我尝试使用以下代码:

with open ("file", 'r') as rf:
    v=rf.read().split('\n')

 i=[]
 for e in v[1::47]: #(only the names)
     r=(e[:12])
     s=(r[:2])
     q= sum(c != ' ' for c in s)
    print(q)

但它给了我这个输出

2
2
2
2
2
2

我什至尝试过导入计数器,但它给了我这样的输出:

C 1
W 1
C 1
W 1
C 1
S 1

请建议一些方法,以便我可以获得预期的输出。任何帮助将不胜感激。

【问题讨论】:

  • 逐行读取文件,并使用字典记录计数。
  • thisthis 有你的答案。
  • @TimBiegeleisen 我正在使用 read.split() 读取文件,在尝试 realines() 时,我没有得到这些行,例如:CW,CW,我之前使用 read.split 实现的()。在使用 read.split() 的计数时,它给了我与前面提到的相同的答案。
  • @Babydesta 在发布我的问题之前,我确实看过这些问题。不幸的是,这些代码对我不起作用。

标签: python python-3.x file dataframe counting


【解决方案1】:

确实使用Counter

from collections import Counter
with open("xyz.txt") as f:
    c = Counter(line.split()[0] for line in f)
    for k,n in c.items():
        print(k, n)

输入文件为

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042 1
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076 1
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236 1
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213 1
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166 1
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749 1

生产

CW 3
CS 3

【讨论】:

  • 谢谢,这行得通。当我尝试 Counter 方法时,我没有指定中间的换行符,这就是它给我答案的原因。非常感谢。
【解决方案2】:

要统计文件中CW和CS的总数。

试试这个:

di = { }
with open("file", "r") as f:
    for l in f:
        l = l.strip()
        di[l] = di[l] + 1 if l in di else 1


for k, v in di.items():
    print("Line: %s and Count: %d" % (k, v))

输出:

Line: CW and Count: 3
Line: CS and Count: 3

【讨论】:

  • 感谢您的评论,我尝试使用它,这是我得到的输出: Line: CW and Count: 1 Line: CW and Count: 1 Line: CW and Count: 1
  • 查看我在答案中添加的输出。你的文件有问题!
  • 我的文件是 .DIST 文件,也许这就是它无法给我答案的原因。
【解决方案3】:

你可以试试下面的代码。

>>> text = '''CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042
... CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076
... CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236
... CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213
... CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166
... CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749'''
>>> items = [line.split()[0] for line in text.splitlines()]
>>> val = set([line.split()[0] for line in text.splitlines()])
>>> for item in val:
...     print(f'{items.count(item)} #For {item}')
...
3 #For CW
3 #For CS

【讨论】:

  • AttributeError: '_io.TextIOWrapper' 对象没有属性 'splitlines'
  • @Dustrokes 您使用的是哪个 python 版本? docs.python.org/3/library/…
  • 我用的是python 3.7,会更新,然后试试你的代码。
【解决方案4】:

Python 3.8.1 我希望这会有所帮助。我尝试同时制作一个带有解释的功能示例代码,以了解发生了什么。

# Global variables
file = "lista.txt"
countDictionary = {}

# Logic Read File
def ReadFile(fileName):
    # Try is optional, is used to track error and to prevent them
    # Also except will be optional because is used on try
    try:
        # Open file in read mode
        with open(fileName, mode="r") as f:
            # Define line
            line = f.readline()
            # For every line in this file
            while line:
                # Get out all white spaces (ex: \n, \r)
                # We will call it item (I asume that CW and CS are some data)
                item = line.strip()[:2]

                # Counting logic
                # Dictionary have at least 2 values I call them data and info
                # Data is like key (name/nickname/id) of the information
                # Info is the value (the information) for this data
                # First will check if data is new and will set info = integer 1
                if item not in countDictionary.keys():
                    countDictionary[item] = 1
                # If is not new will update the count number
                else:
                    info = countDictionary[item]    #will get the curent count number
                    countDictionary[item] = info+1  # will increse the count by one

                # Go to next line by defineing the line again
                # With out that this logic will be on infinite loop just for first line
                line = f.readline()

        # This is optional to. Is callet automatical by python to prevent some errors
        # But I like to be shore
        f.close()

    # In case the file do not exist
    except FileNotFoundError:
        print(f"ERROR >> File \"{fileName}\" do not exist!")

# Execut Function
ReadFile(file)

# Testing dictionary count
for k,j in countDictionary.items():
    print(k, ">>", j)

控制台输出:

========================= RESTART: D:\Python\StackOverflow\help.py =========================
CW >> 3
CS >> 3
>>> 

文件lista.txt

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042 1
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076 1
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236 1
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213 1
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166 1
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749 1

【讨论】:

  • 没有收到任何输出。
  • 你应该接受。使文件成为文件的名称或路径......在看到答案后,我的例子将只计算相同的行,因为你只给我们一个简单的 CW 和 CS 文本作为行......在这种情况下,你需要只检查前 2 个字符 (update item = line.strip()[:2])。我会更新我的代码。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2010-11-15
  • 2019-08-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-01-04
相关资源
最近更新 更多