[Zope] HTML To either RTF or preferably DOC

Jeffrey Shell Jeffrey@digicool.com
Wed, 29 Sep 1999 19:17:15 -0400


> We would like to convert an HTML document to either a RTF or 
> preferably
> Word DOC. The caveat is that we would like to do this from within Zope
> (including an external method). Anyone done this or ideas?

Word 2000 has a strange, new, funny, yet almost normal-for-Microsoft way
of working with (read -- round-tripping) with HTML.  They have a bunch
of XML at the top of the document embedded in an HTML comment (looking
almost like old style DTML actually) that contains a lot of the extra
Word 2K meta-information (like author and whatnot).  Images and other
related files are handled in a similar fashion using some XML/HTML
hybrid-ing.  Style sheets are used heavily.  The good thing about all
this (I suppose) is that you don't lose a lot of special Word
information when going to and from the web.

If you're using Word 2000, open a normal HTML file that's like the
document that you want and save it in Words HTML format and see what
happens and try to re-create it with DTML to have dynamic generation on
the Zope side.  Re-structure the document to look like you want it to
(Word styles and all that), do the same save and see what's different.
Finally, see if you can save it from Word as a .doc and if it keeps the
style you want.

This is a real hands-on, down-and-dirty-in-the-experimental-slime work
if you go my route.  Just some suggestions made from observations I've
made during my three uses of Word in the past two months.

(Or is it two uses in three?) ;)