As part of Help Builder one of the things I do is export Help documents into Word format. This works pretty well as Word is for the most part smart enough to translate the HTML into a workable and almost nicely formatted document.
A simple way to do this is to open the document in Word and then copy the entire selection and paste it into another document that contains the proper format template to export to. It looks something like this (using Visual FoxPro 8 code):
llError = .F.
TRY
oWord=CREATE("word.application")
oWord.VISIBLE = .F.
DoEvents
CATCH
MESSAGEBOX("Unable to load the Word COM object:" + CHR(13) + CHR(13) +;
MESSAGE())
llError = .T.
ENDTRY
IF llError
RETURN
ENDIF
*** Start by loading the HTML file
oDoc = oWord.Documents.OPEN(lcFile)
DOEVENTS
*** Select and copy the whole thing to the ClipBoard
oWord.SELECTION.WholeStory
oWord.SELECTION.COPY
oDoc.CLOSE()
*** Copy the template file
COPY FILE (THISFORM.oHelp.cProjPath + ;
"templates\msword\helpbuildertemplate.doc") TO ;
( FORCEEXT(lcFile,"doc") )
oDoc = oWord.Documents.OPEN(FORCEEXT(lcFile,"doc"))
oWord.SELECTION.Paste()
oDoc.SAVEAS(FORCEEXT( lcFile, "doc" ))
This works great with one exception: All the images embedded in the document are considered external to the document – ie. linked images that must be there on disk. What I really need though is images that are embedded into the document.
After much shitty research through the woefully incomplete docs for Office Automation I found a way to embed ‘most’ images easily (I’ll come back to the ‘most’ part shortly as this is the reason for this rant). The following is a VBA macro I created that I call from the VFP code:
' This macro replaces external image links with
' embedded images so the document is self-contained
' This macro must be run while the images are in place
Sub ReplaceImages()
For Each oField In ActiveDocument.Fields
If oField.Type = wdFieldIncludePicture Then
oField.LinkFormat.SavePictureWithDocument = True
End If
Next
End Sub
It took me a while to find the SavePictureWithDocumnent option. It basically determines whether images are embedded in the document or live externally as linked files.
Well, after some back and forth I found out that the above approach is simple, but causes major problems when dealing with a large document. When running the above code on small documents it works well, but with larger documents (where large refers to the images embedded) memory usage goes through the roof and Word locks up. So… back to the drawing board and some adjustments to code I originally used which is more complex but interactively removes the image and then pastes it back into the document as an embedded image:
'***************************************************************************
'*** ReplaceImages
'*****************
' This macro replaces external image links with
' embedded images so the document is self-contained
' This macro must be run while the images are in place
Sub ReplaceImages()
lcPath = ActiveDocument.FullName
lcPath = Left(lcPath, InStrRev(lcPath, "\"))
For Each oField In ActiveDocument.Fields
If oField.Type = wdFieldIncludePicture Then
lcText = oField.Code
lnLoc1 = InStr(1, lcText, Chr(34))
lnLoc2 = InStr(lnLoc1 + 1, lcText, Chr(34))
lcCode = Mid(lcText, lnLoc1 + 1, lnLoc2 - 1 - lnLoc1)
lcCode = Replace(lcCode, "/", "\")
lcCode = Replace(lcCode, "\\", "\")
oField.Select
If FileExists(lcCode) Then
Selection.InlineShapes.AddPicture FileName:=lcCode, LinkToFile:=False, SaveWithDocument:=True
GoTo DonePath
End If
lcCode = Replace(lcCode, "\\", "\")
lcCode = lcPath + lcCode
If FileExists(lcCode) Then
Selection.InlineShapes.AddPicture FileName:=lcCode, LinkToFile:=False, SaveWithDocument:=True
End If
DonePath:
End If
Next
End Sub
Private Function FileExists(ByVal FileName As String)
Dim FileSize As Long
On Error GoTo FileExists_Error
FileSize = FileLen(FileName)
FileExists = True
GoTo FileExists_Exit
FileExists_Error:
FileExists = False
FileExists_Exit:
On Error GoTo 0
End Function
This code works well even on a very large document of over 500 pages. It’s still not fast, but it doesn’t seem to hurt Word resources much and shows something happening on the screen that doesn’t make the app appear locked up.
Now to the MOST part. The FOR loop does not catch all images. The problem is that there appears to be a bug in the Fields collection parsing as it does not catch all images from the HTML. Specifically if the image is marked up with any extra attributes the image does not show up in the Fields collection. Something as simple as:
<img src=”images/wwhelp.gif” align=”right”>
causes the image to not be included in the fields list which bites big time. Removing the align or any other tag (like HSPACE) causes the image to show up fine.
I have yet to figure out how to get at those images, but in the meantime I’ve been removing these tags from the default generation code in Help Builder which works for some of the automatically generated images such as icons for headers and class/data lists.
It always amazes me how things like this get through a testing process?