使用 PowerShell 删除文本文件的顶行答案

【问题标题】：Remove Top Line of Text File with PowerShell使用 PowerShell 删除文本文件的顶行
【发布时间】：2010-01-15 19:38:17
【问题描述】：

我试图在导入前删除大约 5000 个文本文件的第一行。

我对 PowerShell 还是很陌生，所以不知道要搜索什么或如何解决这个问题。我目前使用伪代码的概念：

set-content file (get-content unless line contains amount)

但是，我似乎无法弄清楚如何执行包含之类的操作。

【问题讨论】：

标签： powershell

【解决方案1】：

虽然我真的很佩服@hoge 的答案，因为它提供了一种非常简洁的技术和一个概括它的包装函数，并且我鼓励对它进行投票，但我不得不对使用临时文件的其他两个答案发表评论（它啃我喜欢黑板上的指甲！）。

假设文件不大，您可以强制管道在离散的部分中运行——从而避免对临时文件的需要——明智地使用括号：

(Get-Content $file | Select-Object -Skip 1) | Set-Content $file

...或简称：

(gc $file | select -Skip 1) | sc $file

【讨论】：

【解决方案2】：

这不是世界上最高效的，但这应该可行：

get-content $file |
    select -Skip 1 |
    set-content "$file-temp"
move "$file-temp" $file -Force

【讨论】：

当我尝试运行它时，它似乎在 -skip 上出错了。这可能来自不同的版本吗？
-Skip 是 PowerShell 2.0 中 Select-Object 的新增功能。此外，如果文件都是 ascii，那么您可能需要使用 set-content -enc ascii。如果编码是混合的，那么除非您不关心文件编码，否则它会变得更加棘手。

【解决方案3】：

使用变量表示法，无需临时文件即可：

${C:\file.txt} = ${C:\file.txt} | select -skip 1

function Remove-Topline ( [string[]]$path, [int]$skip=1 ) {
  if ( -not (Test-Path $path -PathType Leaf) ) {
    throw "invalid filename"
  }

  ls $path |
    % { iex "`${$($_.fullname)} = `${$($_.fullname)} | select -skip $skip" }
}

【讨论】：

【解决方案4】：

我只需要执行相同的任务，gc | select ... | sc 在读取 1.6 GB 文件时占用了我机器上的 4 GB RAM。它在读取整个文件后至少 20 分钟没有完成（正如 Process Explorer 中的 Read Bytes 所报告的那样），此时我不得不杀死它。

我的解决方案是使用更多的 .NET 方法：StreamReader + StreamWriter。请参阅此答案以获得讨论性能的绝佳答案：In Powershell, what's the most efficient way to split a large text file by record type?

以下是我的解决方案。是的，它使用了一个临时文件，但就我而言，这并不重要（这是一个巨大的 SQL 表创建和插入语句文件）：

PS> (measure-command{
    $i = 0
    $ins = New-Object System.IO.StreamReader "in/file/pa.th"
    $outs = New-Object System.IO.StreamWriter "out/file/pa.th"
    while( !$ins.EndOfStream ) {
        $line = $ins.ReadLine();
        if( $i -ne 0 ) {
            $outs.WriteLine($line);
        }
        $i = $i+1;
    }
    $outs.Close();
    $ins.Close();
}).TotalSeconds

它返回：

188.1224443

【讨论】：

IIRC 这是因为 gc|select 周围的括号意味着它将整个文件读入内存，然后再通过管道传输。否则，打开的流会导致 set-content 失败。对于大文件，我认为您的方法可能是最好的
感谢@AASoft，为您提供出色的解决方案！我允许自己通过在每个循环中放弃比较操作来稍微改进它，从而将处理速度提高 25% - 请参阅my answer 了解详细信息。

【解决方案5】：

受AASoft's answer的启发，我进一步改进了它：

避免循环变量$i和比较在每个循环中与0
将执行包装到 try..finally 块中以始终关闭正在使用的文件
使解决方案适用于要从文件开头删除的任意行数
使用变量$p 引用当前目录

这些更改导致以下代码：

$p = (Get-Location).Path

(Measure-Command {
    # Number of lines to skip
    $skip = 1
    $ins = New-Object System.IO.StreamReader ($p + "\test.log")
    $outs = New-Object System.IO.StreamWriter ($p + "\test-1.log")
    try {
        # Skip the first N lines, but allow for fewer than N, as well
        for( $s = 1; $s -le $skip -and !$ins.EndOfStream; $s++ ) {
            $ins.ReadLine()
        }
        while( !$ins.EndOfStream ) {
            $outs.WriteLine( $ins.ReadLine() )
        }
    }
    finally {
        $outs.Close()
        $ins.Close()
    }
}).TotalSeconds

第一个更改使我的 60 MB 文件的处理时间从 5.3s 减少到 4s。其余的更改更具装饰性。

【讨论】：

您可能需要将-and !$ins.EndOfStream 添加到for 循环条件以涵盖文件行数少于$skip 的情况。
感谢您的提醒！这是有道理的:-)

【解决方案6】：

$x = get-content $file
$x[1..$x.count] | set-content $file

就这么多了。冗长无聊的解释如下。获取内容返回一个数组。我们可以“索引到”数组变量，如 this 和 other Scripting Guys 帖子中所示。

例如，如果我们这样定义一个数组变量，

$array = @("first item","second item","third item")

所以 $array 返回

first item
second item
third item

然后我们可以“索引”该数组以仅检索其第一个元素

$array[0]

或者只有它的第二个

$array[1]

或从第二个到最后一个索引值的range。

$array[1..$array.count]

【讨论】：

【解决方案7】：

我刚从一个网站了解到：

Get-ChildItem *.txt | ForEach-Object { (get-Content $_) | Where-Object {(1) -notcontains $_.ReadCount } | Set-Content -path $_ }

或者你可以使用别名来缩短它，比如：

gci *.txt | % { (gc $_) | ? { (1) -notcontains $_.ReadCount } | sc -path $_ }

【讨论】：

非常感谢这个解决方案。你能指出你提到的网站吗？

【解决方案8】：

skip` 不起作用，所以我的解决方法是

$LinesCount = $(get-content $file).Count
get-content $file |
    select -Last $($LinesCount-1) | 
    set-content "$file-temp"
move "$file-temp" $file -Force

【讨论】：

【解决方案9】：

另一种从文件中删除第一行的方法，使用多重赋值技术。参考Link

 $firstLine, $restOfDocument = Get-Content -Path $filename 
 $modifiedContent = $restOfDocument 
 $modifiedContent | Out-String | Set-Content $filename

【讨论】：

【解决方案10】：

对于较小的文件，您可以使用：

& C:\windows\system32\more +1 oldfile.csv > newfile.csv |外空

...但它在处理我的 16MB 示例文件时不是很有效。它似乎没有终止并释放对 newfile.csv 的锁定。

【讨论】：