【问题标题】:Python Split files into multiple smaller filesPython 将文件拆分为多个较小的文件
【发布时间】:2015-06-01 20:05:54
【问题描述】:

编写一个名为file_split(filename, number_of_files) 的函数,它将一个输入文件拆分为多个输出文件。文件应尽可能均匀地分割。当文件长度能被要创建的文件数整除时(​​一个 10 行的文件,分成 2 个文件,每个输出文件应该有 5 行。当长度不能整除时,所有输出文件的长度不能有差异大于 1。例如,一个 10 行的文件,分成 3 行,输出文件的长度为 3、3 和 4。

我已经编写了我的代码,但我不知道如何处理大于 1 部分的差异,我需要帮助修改我的代码以包含该部分。 (如果不是偶数,我的代码会为最后一行创建一个新文件)

def get_line_counts(filename, number_of_files):
    try:
        my_file = open(filename, 'r')
    except IOError:
        print("File does not exist")
        return    
    input = my_file.read().split('\n')
    outputBase = 'lel'    
    total_lines = 0
    with open('myfile.txt') as infp:
        for line in infp:
            if line.strip():  
                total_lines +=1    
    base_size = total_lines // number_of_files    
    at = 1
    for lines in range(0, len(input), base_size):
        outputData = input[lines:lines+base_size]
        output = open(outputBase + str(at) + '.txt', 'w')
        output.write('\n'.join(outputData))
        output.close()
        at += 1

【问题讨论】:

  • "写一个函数..."?我们是你的奴隶,你可以给我们发号施令吗?
  • @Stefan Pochmann 你读过下一段吗?
  • @Stefan Pochmann 这就是文本中的问题,而不是我的措辞,哈哈
  • 那么您为什么不这样说或者以一些的方式将其标记为引用?它的写作方式,我觉得它的风格和礼仪很差,当然不能“笑”。

标签: python file split


【解决方案1】:

循环工作很简单:

with open('myfile.txt') as infp:
    files = [open('%d.txt' % i, 'w') for i in range(number_of_files)]
    for i, line in enumerate(infp):
        files[i % number_of_files].write(line)
    for f in files:
        f.close()

【讨论】:

  • 谢谢,帮了大忙
【解决方案2】:

未经测试

我会使用模运算

res = len(lines) % number_of_files
for lines in range(0, len(input), base_size):
    if at == len(input)+res+1:
        outputData = input[lines:-1]
    else: 
       ...

也就是说,只需将剩余的行转储到最后一个文件中。

【讨论】:

  • 这不是他想要的,看 zero323 的 cmets
  • 谢谢,不是我需要的,但仍然有帮助!
【解决方案3】:

未来导入打印函数

import boto3
import shutil
import os
import os.path
import urllib
import json
import urllib2
import subprocess
import linecache
import sys

s3client = boto3.client('s3')
s3 = boto3.resource('s3')
def lambda_handler(event, context):
    try:
        for record in event['Records']:
            bucket = record['s3']['bucket']['name']
            key = record['s3']['object']['key']
            print(key)
            p = key.rsplit('/',1)
            keyfile =p[1]
            print("S Object: " + keyfile + " is a FILE")
            inpfilename = keyfile
            ou = inpfilename.split('.',1)
            outfilename = ou[0]
            print("inpfilename :" + inpfilename)
            body = s3client.get_object(
                                        Bucket=bucket,
                                        Key=key
                                        )["Body"].read().split('\n')

            lines_per_file = 3  # Lines on each small file
            created_files = 0  # Counting how many small files have been created
            op_rec=''     # Stores lines not yet written on a small file
            lines_counter = 0  # Same as len(lines)
            for line in body:  # Go throught the whole big file
                op_rec = op_rec + '\n' + line
                lines_counter += 1
                if lines_counter == lines_per_file:
                    idx = lines_per_file * (created_files + 1)
                    body_contents = str(op_rec)
                    file_name = "%s_%s.txt" %(outfilename, idx)
                    target_file = "folder-name/" + file_name
                    print(target_file)
                    s3client.put_object(ACL='public-read',ServerSideEncryption='AES256', Bucket='bucket-name',Key= target_file, Body=body_contents )
                    op_rec =''  # Reset variables
                    lines_counter = 0
                    created_files += 1  # One more small file has been created
            # After for-loop has finished
            if lines_counter:  # There are still some lines not written on a file?
                idx = lines_per_file * (created_files + 1)
                body_contents = str(op_rec)
                file_name = "%s_%s.txt" %(outfilename, idx)
                target_file = "folder-name/" + file_name
                print(target_file)
                s3client.put_object(ACL='public-read',ServerSideEncryption='AES256', Bucket='bucket-name',Key= target_file, Body=body_contents )
                created_files += 1

            print ('%s small files (with %s lines each) were created.' % (created_files,lines_per_file))


    except Exception as e:
           print(e)

【讨论】:

    【解决方案4】:

    检查这个 https://github.com/roshanok/SplitAndCompile

    --------------------------------------------------------------
    Usage: SplitAndCombine.py [-h] [-i INPUT] [-s] [-n CHUNK] [-m]\n
    optional arguments:
    -h, --help            show this help message and exit
    -i INPUT, --input INPUT
                        Provide the File that needs to be Split
    -s, --split           To Split the File
    -n CHUNK, --chunk CHUNK
                        [n]: No. of files to be created
                        [n]kb : Split the file in nKB size
                        [n]b : Split the file in nb size
                        [n]mb : Split the file in nmb size
                        [n]gb : Split the file in ngb size
    
    -m, --merge           Merge the Files
    

    【讨论】:

      猜你喜欢
      • 2017-08-21
      • 2013-09-21
      • 1970-01-01
      • 1970-01-01
      • 2013-07-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-09-23
      相关资源
      最近更新 更多