【发布时间】:2021-07-06 19:04:48
【问题描述】:
我需要检查大量 word 文档(doc 和 docx)以获取特定文本,并找到了 Scripting Guys 提供的很棒的教程和脚本;
脚本读取目录中的所有文档并给出以下输出;
- 提及次数
- 找到特定文本的所有文档中的总字数
- 包含特定文本的所有文件的目录。
这就是我所需要的,但是他们的代码似乎并没有真正检查任何文档的标题,顺便说一下,这是我要查找的特定文本所在的位置。让脚本读取标题文本的任何提示和技巧都会让我非常高兴。
另一种解决方案可能是删除格式,以便标题文本成为文档其余部分的一部分?这可能吗?
编辑:忘记链接脚本:
[cmdletBinding()]
Param(
$Path = "C:\Users\use\Desktop\"
) #end param
$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$application = New-Object -comobject word.application
$application.visible = $False
$docs = Get-childitem -path $Path -Recurse -Include *.docx
$findText = "specific text"
$i = 1
$totalwords = 0
$totaldocs = 0
Foreach ($doc in $docs)
{
Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100)
$document = $application.documents.open($doc.FullName)
$range = $document.content
$null = $range.movestart()
$wordFound = $range.find.execute($findText,$matchCase,
$matchWholeWord,$matchWildCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap)
if($wordFound)
{
$doc.fullname
$document.Words.count
$totaldocs ++
$totalwords += $document.Words.count
} #end if $wordFound
$document.close()
$i++
} #end foreach $doc
$application.quit()
"There are $totaldocs and $($totalwords.tostring('N')) words"
#clean up stuff
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($range) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()
编辑 2:我的同事想到了调用节标题;
Foreach ($doc in $docs)
{
Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100)
$document = $application.documents.open($doc.FullName)
# Load first section of the document
$section = $doc.sections.item(1);
# Load header
$header = $section.headers.Item(1);
# Set the range to be searched to only Header
$range = $header.content
$null = $range.movestart()
$wordFound = $range.find.execute($findText,$matchCase,
$matchWholeWord,$matchWildCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap,$Format)
if($wordFound) [script continues as above]
但这会遇到以下错误:
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:27 char:31
+ $section = $doc.sections.item <<<< (1);
+ CategoryInfo : InvalidOperation: (item:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:29 char:33
+ $header = $section.headers.Item <<<< (1);
+ CategoryInfo : InvalidOperation: (Item:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:33 char:26
+ $null = $range.movestart <<<< ()
+ CategoryInfo : InvalidOperation: (movestart:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:35 char:34
+ $wordFound = $range.find.execute <<<< ($findText,$matchCase,
+ CategoryInfo : InvalidOperation: (execute:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
这是正确的方法还是死路一条?
【问题讨论】:
标签: powershell ms-word automation