【问题标题】:I cannot extract hyperlinks from A Word doc with Powershell我无法使用 Powershell 从 A Word 文档中提取超链接
【发布时间】:2021-04-26 13:11:05
【问题描述】:

我正在尝试以递归方式遍历目录结构以查找 word 文档,然后提取超链接。代码执行时输出如下:

processing 2 docs

File Name                Hyperlink
---------                ---------
C:\temp\doc1.docx
C:\temp\doc1.docx
C:\temp\folder\doc2.docx
C:\temp\folder\doc2.docx

我尝试过的任何方法似乎都不起作用。我试过使用:

  • “超链接”= $_Address
  • “超链接”= $thisDoc.Address
  • “超链接” = $thisDoc.Hyperlink.Address
Clear-Host

$parentFolder = "C:\temp"

$ourDocs = Get-ChildItem -Recurse -LiteralPath $parentFolder -file -include *.doc*
"processing {0} docs" -f $ourDocs.Count


$word = New-Object -ComObject word.application

$word.Visible = $false
$word.ScreenUpdating = $false


$array = New-Object System.Collections.ArrayList

$ourDocs | ForEach-Object{

    $thisDoc = $word.Documents.Open($_.FullName)

    $thisDoc.Hyperlinks | ForEach-Object {

        $array.Add([pscustomobject]@{
        
            "File Name" = $thisDoc.FullName
            "Hyperlink" = $_Address}) | Out-null
        
    }
    $thisDoc.Close()
                
}

$Word.Quit()

$array

# cleanup com objects
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

【问题讨论】:

    标签: powershell ms-word hyperlink


    【解决方案1】:

    错误在于你如何为你想要的属性值调用它。

    试试这个...重构

    Clear-Host
    
    $parentFolder = "D:\temp\Word"
    
    $ourDocs = Get-ChildItem -Recurse -LiteralPath $parentFolder -file -include '*.doc*'
    "processing {0} docs" -f $ourDocs.Count
    
    
    $word                = New-Object -ComObject word.application
    $word.Visible        = $false
    $word.ScreenUpdating = $false
    
    # This really is not needed for your posted use case.
    # $array = New-Object System.Collections.ArrayList
    
    $ourDocs | 
    ForEach-Object{
        $thisDoc = $word.Documents.Open($PSItem.FullName)
    
        @($thisDoc.Hyperlinks) | 
        ForEach-Object {
            [pscustomobject]@{
                FileName  = $thisDoc.FullName
                HyperLink = $PSitem.Address
            }
        }
        $thisDoc.Close()
    }
    
    $Word.Quit()
    
    
    # cleanup com objects
    [System.Runtime.Interopservices.Marshal]::ReleaseComObject($word) | Out-Null
    [System.GC]::Collect()
    [System.GC]::WaitForPendingFinalizers()
    
    # Results
    <#
    processing 4 docs
    
    FileName                     HyperLink                                        
    --------                     ---------                                        
    D:\temp\Word\WES - Copy.docx http://stackoverfow.com/                         
    D:\temp\Word\WES - Copy.docx https://superuser.com/questions/tagged/powershell
    #>
    

    相对于您的 Csv 评论和我对它的回复进行更新...

    ...
    
    $ourDocs | 
    ForEach-Object{
        $thisDoc = $word.Documents.Open($PSItem.FullName)
    
        @($thisDoc.Hyperlinks) | 
        ForEach-Object {
            [pscustomobject]@{
                FileName  = $thisDoc.FullName
                HyperLink = $PSitem.Address
            }
        } | 
        Export-Csv -Path 'D:\Temp\WordHyperLinkReport.csv' -Append -NoTypeInformation
        $thisDoc.Close()
    }
    
    ...
    
    Import-Csv -Path 'D:\Temp\WordHyperLinkReport.csv'
    # Results
    <#
    FileName                     HyperLink                                        
    --------                     ---------                                        
    D:\temp\Word\WES - Copy.docx http://stackoverfow.com/                         
    D:\temp\Word\WES - Copy.docx https://superuser.com/questions/tagged/powershell
    #>
    

    【讨论】:

    • 您必须利用 -append 将该导出置于循环中,而不是在其之外。
    猜你喜欢
    • 2021-12-14
    • 2016-02-12
    • 1970-01-01
    • 2013-10-02
    • 2010-11-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多