【发布时间】:2020-10-01 14:51:17
【问题描述】:
我在 Mac 也遇到过这样的问题,只是想在 bash 脚本文件中分享我的解决方案,无需额外的应用程序!
【问题讨论】:
我在 Mac 也遇到过这样的问题,只是想在 bash 脚本文件中分享我的解决方案,无需额外的应用程序!
【问题讨论】:
此脚本将提取嵌入在 word 文档中的所有 pdf 文件。
只需将脚本文件放在 word.docx 文件所在的位置并运行它(首先授予它权限),例如:
./extract_docx_objects.sh word.docx
提取的文件将在子文件夹docx_zip/word/embeddings/中。
代码如下:
docx=$1
echo $docx
rm -rf docx_zip
mkdir -p docx_zip
cp $docx docx_zip/temp.zip
cd docx_zip/
unzip temp.zip
cd word/embeddings/
FILES=*.bin
echo `ls -la $FILES`
for f in $FILES
do
echo "processing $f..."
fname=${f%.*}
dd if=$f of=$fname.pdf bs=1
start=`xxd -b $f|grep %PDF -n|awk -F: '{print $1}'`
start1=$(((start-1)*6))
end=`xxd -b $f|grep %%EOF -n|awk -F: '{print $1}'`
end1=$(((end-1)*6+5*2))
dd skip=$start1 count=$end1 if=$f of=$fname.pdf bs=1
done
您可以在删除文件夹之前添加一个检查文件夹是否已经存在(因为我没有在这里)。
享受吧!
[信息]
如果您需要 Windows 中的 VBA 宏来执行相同操作,这是我的解决方案:
VBA中有一个部分解决方案,需要准备才能运行:
VBA 宏:
Sub export_PDFs()
Dim Contents As String
Dim PDF As String
Dim hFile As Integer
Dim i As Long, j As Long
Dim ExtractedZippedDocxFolder, FileNameBin, FileNamePDF, BinFolderPath As String
Dim fileIndex As Integer
Dim dlgOpen As FileDialog
Set dlgOpen = Application.FileDialog( _
FileDialogType:=msoFileDialogFolderPicker)
With dlgOpen
.AllowMultiSelect = False
.Title = "Select the unzipped docx folder to extract PDF file(s) from"
.InitialFileName = "*.docx"
.Show
End With
ExtractedZippedDocxFolder = dlgOpen.SelectedItems.Item(1)
BinFolderPath = ExtractedZippedDocxFolder + "\word\embeddings"
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder(BinFolderPath)
fileIndex = 0
For Each objFile In objFolder.Files
If LCase$(Right$(objFile.Name, 4)) = ".bin" Then
FileNameIndex = Left$(objFile.Name, Len(objFile.Name) - Len(".bin"))
FileNameBin = BinFolderPath + "\" + FileNameIndex + ".bin"
FileNamePDF = BinFolderPath + "\" + FileNameIndex + ".pdf"
hFile = FreeFile
Open FileNameBin For Binary Access Read As #hFile
Contents = String(LOF(hFile), vbNullChar)
Get #hFile, , Contents
Close #hFile
i = InStrB(1, Contents, "%PDF")
j = InStrB(i, Contents, "%%EOF")
If (InStrB(j + 1, Contents, "%%EOF") > 0) Then j = InStrB(j + 1, Contents, "%%EOF")
PDF = MidB(Contents, i, j + 5 - i + 12)
Open FileNamePDF For Binary Access Write As #hFile
Put #hFile, , PDF
Close #hFile
fileIndex = fileIndex + 1
End If
Next
If fileIndex = 0 Then
MsgBox "Unable to find any bin file in the givven unzipped docx file content"
Else
MsgBox Str(fileIndex) + " files were processed"
End If
End Sub
【讨论】: