【问题标题】:Powershell won't read header text in word documents?Powershell不会读取word文档中的标题文本?
【发布时间】:2021-07-06 19:04:48
【问题描述】:

我需要检查大量 word 文档(doc 和 docx)以获取特定文本,并找到了 Scripting Guys 提供的很棒的教程和脚本;

https://blogs.technet.microsoft.com/heyscriptingguy/2012/08/01/find-all-word-documents-that-contain-a-specific-phrase/

脚本读取目录中的所有文档并给出以下输出;

  1. 提及次数
  2. 找到特定文本的所有文档中的总字数
  3. 包含特定文本的所有文件的目录。

这就是我所需要的,但是他们的代码似乎并没有真正检查任何文档的标题,顺便说一下,这是我要查找的特定文本所在的位置。让脚本读取标题文本的任何提示和技巧都会让我非常高兴。

另一种解决方案可能是删除格式,以便标题文本成为文档其余部分的一部分?这可能吗?

编辑:忘记链接脚本:

[cmdletBinding()]
Param(
 $Path = "C:\Users\use\Desktop\"
) #end param

$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$application = New-Object -comobject word.application
$application.visible = $False
$docs = Get-childitem -path $Path -Recurse -Include *.docx
$findText = "specific text"
$i = 1
$totalwords = 0
$totaldocs = 0

Foreach ($doc in $docs)
{
 Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100) 
 $document = $application.documents.open($doc.FullName)
 $range = $document.content
 $null = $range.movestart()
 $wordFound = $range.find.execute($findText,$matchCase,
  $matchWholeWord,$matchWildCards,$matchSoundsLike,
  $matchAllWordForms,$forward,$wrap)
  if($wordFound) 
    { 
     $doc.fullname
     $document.Words.count
     $totaldocs ++
     $totalwords += $document.Words.count
    } #end if $wordFound
 $document.close()
 $i++
} #end foreach $doc
$application.quit()
"There are $totaldocs and $($totalwords.tostring('N')) words"

#clean up stuff
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($range) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()

编辑 2:我的同事想到了调用节标题;

Foreach ($doc in $docs)
{
 Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100) 
 $document = $application.documents.open($doc.FullName)
 # Load first section of the document
 $section = $doc.sections.item(1);
 # Load header
 $header = $section.headers.Item(1);

 # Set the range to be searched to only Header
 $range = $header.content
 $null = $range.movestart()

 $wordFound = $range.find.execute($findText,$matchCase,
  $matchWholeWord,$matchWildCards,$matchSoundsLike,
  $matchAllWordForms,$forward,$wrap,$Format)
  if($wordFound) [script continues as above]

但这会遇到以下错误:

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:27 char:31
+  $section = $doc.sections.item <<<< (1);
    + CategoryInfo          : InvalidOperation: (item:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:29 char:33
+  $header = $section.headers.Item <<<< (1);
    + CategoryInfo          : InvalidOperation: (Item:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:33 char:26
+  $null = $range.movestart <<<< ()
    + CategoryInfo          : InvalidOperation: (movestart:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:35 char:34
+  $wordFound = $range.find.execute <<<< ($findText,$matchCase,
    + CategoryInfo          : InvalidOperation: (execute:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

这是正确的方法还是死路一条?

【问题讨论】:

    标签: powershell ms-word automation


    【解决方案1】:

    如果你想要标题文本,你可以尝试以下:

    $document.content.Sections.First.Headers.Item(1).range.text
    

    【讨论】:

    • 嗨米奇,感谢您的快速回复!我尝试将您的代码添加到我的 $range 中,但遇到以下错误:Method invocation failed because [System.__ComObject] doesn't contain a method named 'movestart'. + $null = $range.movestart &lt;&lt;&lt;&lt; () Sections.First.Headers.Item(1) 是否使其与 $range 不兼容? @Micky Balledelli
    • 这与获取标题无关。使用$range.MoveStart(),区分大小写;)
    • 谢谢 Micky,你当然是对的。您的解决方案有效!我仍然在这里和那里遇到一些错误,但它们与标题位无关。
    【解决方案2】:

    对于将来查看此问题的任何人:我上面的代码有些东西不太适用。无论文档的内容如何,​​它似乎都会返回一个误报并设置 $wordFound = 1,从而列出在 $path 下找到的所有文档。

    在 Find.Execute 中编辑变量似乎不会改变 $wordFound 的结果。我相信问题可能出现在我的 $range 中,因为这是我在逐步浏览代码时唯一出错的地方。

    列出的错误;

    You cannot call a method on a null-valued expression.
    At C:\Users\user\Desktop\Powershell\count.ps1:24 char:58
    +  $range = $document.content.Structures.First.Headers.Item <<<< (1).range.Text
        + CategoryInfo          : InvalidOperation: (Item:String) [], RuntimeException
        + FullyQualifiedErrorId : InvokeMethodOnNull
    
    Exception calling "MoveStart" with "0" argument(s): "The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)"
    At C:\Users\user\Desktop\Powershell\count.ps1:25 char:26
    +  $null = $range.MoveStart <<<< ()
        + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
        + FullyQualifiedErrorId : ComMethodCOMException
    
    You cannot call a method on a null-valued expression.
    At C:\Users\user\Desktop\Powershell\count.ps1:26 char:34
    +  $wordFound = $range.Find.Execute <<<< ($findText,$matchCase,
        + CategoryInfo          : InvalidOperation: (Execute:String) [], RuntimeException
        + FullyQualifiedErrorId : InvokeMethodOnNull
    

    【讨论】:

      猜你喜欢
      • 2013-07-06
      • 2010-09-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多