Rick Strahl's Weblog  

Wind, waves, code and everything in between...
.NET • C# • Markdown • WPF • All Things Web
Contact   •   Articles   •   Products   •   Support   •   Advertise
Sponsored by:
Markdown Monster - The Markdown Editor for Windows

Image Problems when Importing HTML into Microsoft Word via Automation


:P
On this page:

 As part of Help Builder one of the things I do is export Help documents into Word format. This works pretty well as Word is for the most part smart enough to translate the HTML into a workable and almost nicely formatted document.

 

A simple way to do this is to open the document in Word and then copy the entire selection and paste it into another document that contains the proper format template to export to. It looks something like this (using Visual FoxPro 8 code):

 

llError = .F.

TRY

   oWord=CREATE("word.application")

   oWord.VISIBLE = .F.

  

   DoEvents

CATCH

   MESSAGEBOX("Unable to load the Word COM object:" + CHR(13) + CHR(13) +;

              MESSAGE())

   llError = .T.

ENDTRY

IF llError

  RETURN

ENDIF

 

*** Start by loading the HTML file

oDoc = oWord.Documents.OPEN(lcFile)

 

DOEVENTS

 

*** Select and copy the whole thing to the ClipBoard

oWord.SELECTION.WholeStory

oWord.SELECTION.COPY

oDoc.CLOSE()

 

*** Copy the template file

COPY FILE (THISFORM.oHelp.cProjPath + ;

           "templates\msword\helpbuildertemplate.doc") TO ;

          ( FORCEEXT(lcFile,"doc") )

 

oDoc = oWord.Documents.OPEN(FORCEEXT(lcFile,"doc"))

oWord.SELECTION.Paste()

 

oDoc.SAVEAS(FORCEEXT( lcFile, "doc" ))

 

This works great with one exception: All the images embedded in the document are considered external to the document – ie. linked images that must be there on disk. What I really need though is images that are embedded into the document.

 

After much shitty research through the woefully incomplete docs for Office Automation I found a way to embed ‘most’ images easily (I’ll come back to the ‘most’ part shortly as this is the reason for this rant). The following is a VBA macro I created that I call from the VFP code:

 

' This macro replaces external image links with

' embedded images so the document is self-contained

' This macro must be run while the images are in place

Sub ReplaceImages()

 

   For Each oField In ActiveDocument.Fields

      If oField.Type = wdFieldIncludePicture Then

          oField.LinkFormat.SavePictureWithDocument = True

      End If

   Next

End Sub

 

It took me a while to find the SavePictureWithDocumnent option. It basically determines whether images are embedded in the document or live externally as linked files.

 

Well, after some back and forth I found out that the above approach is simple, but causes major problems when dealing with a large document. When running the above code on small documents it works well, but with larger documents (where large refers to the images embedded) memory usage goes through the roof and Word locks up. So… back to the drawing board and some adjustments to code I originally used which is more complex but interactively removes the image and then pastes it back into the document as an embedded image:

 

'***************************************************************************

'*** ReplaceImages

'*****************

' This macro replaces external image links with

' embedded images so the document is self-contained

' This macro must be run while the images are in place

Sub ReplaceImages()

 

   lcPath = ActiveDocument.FullName

   lcPath = Left(lcPath, InStrRev(lcPath, "\"))

  

   For Each oField In ActiveDocument.Fields

      If oField.Type = wdFieldIncludePicture Then

         lcText = oField.Code

         lnLoc1 = InStr(1, lcText, Chr(34))

         lnLoc2 = InStr(lnLoc1 + 1, lcText, Chr(34))

         lcCode = Mid(lcText, lnLoc1 + 1, lnLoc2 - 1 - lnLoc1)

    

         lcCode = Replace(lcCode, "/", "\")

         lcCode = Replace(lcCode, "\\", "\")

        

         oField.Select

        

         If FileExists(lcCode) Then

            Selection.InlineShapes.AddPicture FileName:=lcCode, LinkToFile:=False, SaveWithDocument:=True

            GoTo DonePath

         End If

        

         lcCode = Replace(lcCode, "\\", "\")

         lcCode = lcPath + lcCode

         If FileExists(lcCode) Then

             Selection.InlineShapes.AddPicture FileName:=lcCode, LinkToFile:=False, SaveWithDocument:=True

         End If

        

DonePath:

      End If

   Next

End Sub

 

 

Private Function FileExists(ByVal FileName As String)

Dim FileSize As Long

 

On Error GoTo FileExists_Error

FileSize = FileLen(FileName)

FileExists = True

GoTo FileExists_Exit

 

FileExists_Error:

    FileExists = False

   

FileExists_Exit:

    On Error GoTo 0

 

End Function

 

This code works well even on a very large document of over 500 pages. It’s still not fast, but it doesn’t seem to hurt Word resources much and shows something happening on the screen that doesn’t make the app appear locked up.

 

Now to the MOST part. The FOR loop does not catch all images. The problem is that there appears to be a bug in the Fields collection parsing as it does not catch all images from the HTML. Specifically if the image is marked up with any extra attributes the image does not show up in the Fields collection. Something as simple as:

 

<img src=”images/wwhelp.gif” align=”right”>

 

causes the image to not be included in the fields list which bites big time. Removing the align or any other tag (like HSPACE) causes the image to show up fine.

 

I have yet to figure out how to get at those images, but in the meantime I’ve been removing these tags from the default generation code in Help Builder which works for some of the automatically generated images such as icons for headers and class/data lists.

 

It always amazes me how things like this get through a testing process?

 


The Voices of Reason


 

anonymouse
April 14, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Sounds like what I'd do - just parse the original html document up front and remove all but the src attributes from the <img/> tags, then feed it into your process.

Rick Strahl
April 14, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Good idea. OTHO, why in the hell is Word not smart enough to parse the images properly in the first place? Such a simple thing.

anonymouse
April 15, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Word's a temperamental app; some parts of the automation are amazing, others are just underdeveloped/strange, but it has to be said, the new Front Page 2003 advert has a funny error in it:

http://weblogs.mozillazine.org/djst/archives/004866.html

The casing tip was a fantastic one - thanks Rick!


Taras Strypko
April 22, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

First of aLL, thanks Rick for a usefull info!

It was a lucky chance that i've found this post via google!

I'd like to share some thoughts regarding the subject.

I'm trying to convert .html to .doc via automation from the outside of Word, so i'm free to choose my favorite JavaScript. Also, i wish to do this in background. I wasn't so lucky to get rid of Word modal messages when just saving .html as WordDoc -- there were ugly modal messages compalining about .css in original .html . Finally i've succeeded just using famous Copy/Paste technology:

<script>
myWordApp.Documents.Open(myHTML)
myWordApp.Documents(myHTML).Select()
myWordApp.Selection.Copy()
myWordApp.Documents(myHTML).Close()
var newDoc=VMBVariables.myWordApp.Documents.Add()
myWordApp.Selection.Paste()
</script>

Then look how did i implemented Rick's approach:

<script>
var wdFieldIncludePicture=67
with(newDoc)
{ for(var i=1;i<=Fields.Count;i++)
with(Fields.Item(i))
{ if(Type==wdFieldIncludePicture)
{ Select()
with(VMBVariables.myWordApp.Selection.InlineShapes.AddPicture(LinkFormat.SourceFullName,false,true))
ScaleWidth=ScaleHeight=300
}
}
}
</script>

Now a little bit about above code:

I've removed all that fancy path calculation, because Field object actually have a property containing full path to an external image: LinkFormat.SourceFullName -- i'm fully relying on it. That's why i skip checking wether the file exists or not. Even if i should, i'd do something like the following -- simply to suppress possible run-time error:

<script>
try{
VMBVariables.myWordApp.Selection.InlineShapes.AddPicture(myPictureFileName,false,true))
}
catch(e){
}
</script>

Finally, you may wonder why i'm scaling a result image by 300%? Don't know! That is still a question for me: why Word is decreasing my images by ca. 3 times? Even more funny is that in a Format Picture dialog Word assure me that his 3 times smaller images are 100% of original! Then when i'm trying to increase them -- i see that they are not getting distorted, e. g. they were actually DECREASED by Word! No need to tel that neither Reset button, nor InlineShape.Reset() didn't help!

That's aLL

bb!

Taras Strypko
April 22, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Aha, btw, Word may not recognize .html images regardless of any differences in <a> tag markup! My images are aLL marked exactly the same, but still some of them are missing. May be it's because filenames or so? Amazing!

ashish sharma
May 26, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

thank you very much
i was facing the same problem while generating the word document with html through asp.net
thanks for your hint
thank you very much once again

M.J. Andrew
June 03, 2004

# Wierd Symbols when using word

Hi There,
When I type up my work on microsoft word I then email it to myself to continue on with it when I get home, I Then download the work form my email and when it now appears in microsoft word the work still appears, however now it contains dots (.) between each word and also contains A strange symbol that limits you and destroys the presentation as well as following the cursor, Why does this happen??? could someonew please help me out, Thanks very much!

nava
September 08, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

hi andrew
c to it tht in word
under tools->options->view tab .. the formatting marks are unchecked ...
if they r checked u will get wierd symbols representing
return, space n all

just c if this is the reason ?

nava
September 13, 2004

# re: immd. help plzzz... Image Problems when Importing HTML into Microsoft Word via Automation

i tried implementing it in vc++
embedding linked images in a doc.

the flow i followed.

identify the shapes, if linked picture (get its details like filefullname,top,left,width,height,zorder posn (the alignment details ) n delete the linked picture
now using these alignment properties n using add picture function am trying to embed the image

prb. am facing is : linked picture is not identified as msoLinkedPicture but is being identified as msoAutoShape

so am not able to embedd the image at the linked image's posn.

and

whtever value i give for top in the AddPicture argument it places the picture in the top=0 ... why ???

alec
November 01, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Just wanted to say, your doc helped point me in the right direction with a similar problem I was working on. The image thing also caused me some problems as some images in the document weren't in the Fields collection.

The following code (in perl) shows my solution to unlinking images. This works on my doc on Word 2002 but you may find it behaves badly on very large docs (as per the original solution).

$bit = $word->ActiveDocument->Shapes;

foreach $sec ( in $bit )
{
# $sec->ConvertToInlineShape;
$linkFormat = $sec->LinkFormat;
$linkFormat->{SavePictureWithDocument} = 1;
$linkFormat->BreakLink;
}

$bit = $word->ActiveDocument->Fields;

foreach $sec ( in $bit )
{
$linkFormat = $sec->LinkFormat;
$linkFormat->{SavePictureWithDocument} = 1;
$linkFormat->BreakLink;
}

PeterB
December 06, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Rick,
You're probably long past caring, but have you tried simply unlinking all the fields rather than iterating through each one? The following is a VB.NET snippet. This unlinks all fields, so it may not fit all situations, but it should help boost performance.

' Iterate through each section of the document
' and unlink fields.
For Each docSection As Word.Section In .Sections
docSection.Range.Fields.Update()
docSection.Range.Fields.Unlink()

' Do same for each header/footer in section.

Next docSection

Rick Strahl's WebLog
December 14, 2004

# Importing an HTML document into Word via COM Automation and dealing with Image Embedding (revisited)

This entry revisits a topic I talked about earlier and resolves some of the issues I ran into. Specificially this entry deals with importing HTML into Word and forcing images to get embedded into the document and dealing with Word's excessive memory usage (and frequent lockups) when looping through large collections of the document.

RonP
December 15, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Any idea why inserting a web-based image does NOT work in WORD VBA: ?

ActiveDocument.InlineShapes.AddPicture "http://www.lucenaturale.com/MercedWinter_150.jpg"">http://www.lucenaturale.com/MercedWinter_150.jpg", Savewithdocument:=True, LinktoFile:=False

I get a 5152 "not a valid file name" message. But, if I use that SAME filneame directly in Word's "Insert Picture|From File" dialog, it works fine! Also, if I use VBA in Excel with the Excel technique (and the same filename), it works fine:


dim p as object
set p=ActiveSheet.Pictures.Insert("http://www.lucenaturale.com/MercedWinter_150.jpg"">http://www.lucenaturale.com/MercedWinter_150.jpg")

This one is driving me nuts. The Inlineshapes.Addpicture method works just fine for local images, even images on a local intranet referenced with a network path (//).

RonP
December 15, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

BTW, I see that the message-submission doctored things up a bit. The submission does not have "extra" filenames, or mangled quotes, etc. I'm using the correct format, which is:

.Addpicture Filename:="name" etc.


Rick Strahl
December 17, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

When in doubt, try the Macro recorder :-}... Is it possible that the parameter name is something other htan filename, like Url or Link?

RonP
December 18, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Well, the macro recorder generates exactly the code that fails. The image is inserted, the code that **supposedly** works is generated, and if you then run the macro, it doesn't work! (same 5152 error). Very bizarre. There is some indication this bug might be peculiar to Word 2000. The Shapes object delivers a similar error. I may have to go the API route, but that seems silly...

Dolf
December 22, 2004

# re: Image Problems when Importing HTML into Microsoft Word via Automation

He estado leyendo en la web de microsoft y para hacer lo que quieres prueba lo siguiente:

wdApp.Selection.Fields.Add Range:=Selection.Range, Text:="INCLUDEPICTURE ""<URL>"" \d", PreserveFormatting:=False

bfwkuipers@kuipstuik.nl
January 10, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

When you insert a picture manually Word connects to the web server. For a short moment you see a pop up box saying 'connecting to web server...'. When you insert a picture from a macro it does not. I think that is why it fails. I don't know how to solve it...

Anyone?

PeerB
January 11, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

<i>I have yet to figure out how to get at those images, but in the meantime I’ve been removing these tags from the default generation code in Help Builder which works for some of the automatically generated images such as icons for headers and class/data lists.</i>

In my tests, it Word converts those images to shapes. You can loop through the shapes in the doc and look for those of type MsoShapeType.msoLinkedPicture and then set the shape's LinkFormat.SavePictureWithDocument property to true.

Hope that helps.


PeterB
January 11, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

<i>When you insert a picture manually Word connects to the web server. For a short moment you see a pop up box saying 'connecting to web server...'. When you insert a picture from a macro it does not. I think that is why it fails. I don't know how to solve it... </i>

Call the Update method on the associated field (or all fields) before you unlink.

Rick Strahl
January 11, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Hi Peter, thanks for your help here and before. I was able to get this all working reliably with some of your tips. I wrote this up in another BLOG entry which shows the final reliable solution.

http://west-wind.com/weblog/posts/1178.aspx


PeterB
January 13, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Yep, Rick. I read your other blog entry after I posted my earlier comment. The code in that entry looks good. Thanks.


Help!
March 21, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

. But you know in Microsoft Word, when you copy/paste a pic or something..usually a pic comes up right..well instead I get the freaking code(insert picture.......merge format) ....

I'm getting real annoyed..I tried switching the code, but it only gives 'image unavailable' in this random thing. Argh hard to explain. been doing this for a couple months..but I have to print something and arh!!

thx

ps: how come I was in the dormwire thing a few minutes ago...odd..

SlashRD
May 18, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Folks, thanks a lot for your input on this issue!

BB, I tried your code and it works great. Except, like you, I'm having the image resize issue, and I've tried the scale thing, but not working. I'm using ASP - could you write the asp equivalent of your .NET code?

<script>
var wdFieldIncludePicture=67
with(newDoc)
{ for(var i=1;i<=Fields.Count;i++)
with(Fields.Item(i))
{ if(Type==wdFieldIncludePicture)
{ Select()
with(VMBVariables.myWordApp.Selection.InlineShapes.AddPicture(LinkFormat.SourceFullName,false,true))
ScaleWidth=ScaleHeight=300
}
}
}

Thanks!


Taras Strypko
June 03, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

2 SlashRD:

Oh, my friend, there is really nothing special about that code, so, i suppose, you can paste it in ASP directly. Okay, lemme try to spell it in VB:

<script>
...
With newDoc
For i As Integer = 1 To .Fields.Count
With .Fields.Item(i)
If .Type=wdFieldIncludePicture Then
.Select
With newDoc.Selection.InlineShapes.AddPicture(.LinkFormat.SourceFullName, False, True)
.ScaleWidth=300
.ScaleHeight=300
End With
End If
End With
Next i
End With
</script>

Sorry for that confusing "VMBVariable.." -- it has nothing to the topic..

Kit
June 25, 2005

# re: Image Problems when Importing html into Microsoft Word via Automation

Hi,
I am not very good at expressing myself in computerese - so, here it goes:
Until last week, I was able to import images off the net (non-copyright images) without a problem. I could right click and copy directly into MS WORD, or do a Save Picture As on the Desktop, or Save Target As.
Now I can no longer do this. NO IMAGES at all. I cannot place an image on the desktop that is viewable in WORD or in Photoshop Elements 3. All I get is "The file listed will not be imported as a true file type."
Also, when I'm in word and hit Insert and do Insert Picture From File, I get: "An Error occured while importing this file."
I am in contact with Microsoft support but so far nothing's changed. What could have happened? I mean, did I click something in MS WORD I shouldn't have - or did something bizarre just happen with WORD - I don't know.
If you understand the problem and how to fix it, could you be so kind as to babytalk through whatever procedure I need to perform to get Images again? Ever so grateful,
Thanks,
Kit
segobibi@aol.com

Preeti
July 24, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Hi, i am facing the same problem of missing image while importing html into microsoft word.
I am trying the code u gave in asp. but it is giving error. can u please guide how exactly this code is to be utilized and embeded in the application so that image also appear in the doc.

please help me... i am confused.

thanx alot

Syed Afaq Ali
September 28, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Thank you very much for providing such a useful information. I had been facing the same problem and today was the final deadline for my task. But I have done it by your help. Thankyou very much once again.

Mark
September 30, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Try this for ASP, it worked for me...

Set WordApp = CreateObject("word.application")
Set WordDoc = WordApp.Documents.Add()

WordDoc.InlineShapes.AddPicture "c:\inetpub\wwwroot\test\logo.gif",False,True

ygl72
October 05, 2005

# re: Image Problems when Importing HTML into Microsoft Word via Automation

Hi,

I have an opposite problem to solve but still is uses the image that was copied from Microsoft Word so I hope somebody can help me out coz it's driving me nuts!

My program must allow the users to copy formatted-text and images from Microsoft Word and paste it into a web page. Actually,I'm using a 3rd party tool and it does all the formatting then it generates the HTML string of the web page.

The problem is when the image is pasted, the image file is saved to a directory like the following:

file:///C:\DOCUME~1\...\LOCALS~1\Temp\msoclip1\01\imageName.gif

In reality, I have to save the image into a special folder and not int then "DOCUME~1\..."
I was thinking of using the "CreateHTMLDocument" to load the HTML string so I can iterate through the elements (using javascript) but i found-out that W3C does not recommend this to use or it was not supported anymore (or something to that effect).

SO CAN SOMEBODY HELP ME OUT ON WHAT TO DO? THANKS A LOT!


Mike Curtis
August 20, 2009

# re: Image Problems when Importing HTML into Microsoft Word via Automation

As the Spanish writer says, for URL insertion, a Microsoft site has example code.
It might not paste well here but I tried this and it worked fine;
I used Word 97;
and that I changed the Microsft article at the PreserveFormatting bit as follows:

On my vba, I use 2 lines -
line 1 ends at the underscore:

Sub InsertIncludePictureField()
' NOTE: Replace <Internet Address> with a valid URL.
Selection.Fields.Add Range:=Selection.Range, Text:= _
"INCLUDEPICTURE ""http://www.lpga.com/content/photos/pp_Ammaccapane_Danielle_lg.jpg"" \d", PreserveFormatting:=False
End Sub

West Wind  © Rick Strahl, West Wind Technologies, 2005 - 2024