[Zope-CMF] portal_transformation notes

seb bacon seb@jamkit.com
Thu, 23 Jan 2003 12:18:41 +0000


Chris Withers wrote:
> seb bacon wrote:

>> Then you can convert a word document to html to structured text, etc. 
>> (That'll be a common use case, then ;-)
> 
> 
> How far have you got on this? :-)

Well, for current purposes, I just have to convert a few MS docs into 
text, and can't justify the extra time required to make it really 
generic; but I've been playing with different "pluggable" designs as I go.

The code I've written so far is basically some "use an external tool to 
produce output" stuff (which also works when the tool produces more than 
one bit of output e.g. html + images) with a fairly generic framework. 
But it's not a tool and it doesn't chain transformations together 
automatically, and the conversion logic is hardwired into the File type. 
  I probably won't get a chance to make this tool either, but I have 
been thinking about it.

One thing I'm not clear on is how I would produce transformation chains 
automatically.  I've not really thought about it a lot, but here are 
some starting ideas.  A transformer plugin will register inputs and 
outputs using mime-types:

STXTransformer:
  _inputs  = {'text/x-structured-text':10,
              'text/plain'}
  _outputs = {'text/html':10,
              'text/plain':10,
               'text/x-structured-text':10}

PDFTransformer:
  _inputs = {'text/plain':7,
             'application/postscript':10,
             'text/html':6,
             'application/pdf':10
     }
  _outputs = {'application/pdf':10,
              'text/plain':9}

HTMLTransformer:
  _inputs = {'text/html':10,
             'text/plain':8,
             'application/pdf':5}
  _outputs = {'text/html':10,
              'text/plain':7}

Since different people may write different plugins, there could be 
several different routes for the tool to choose to convert html to a 
pdf.  In the above example, I could go:

  html -> HTMLTransformer -> plain -> PDFTransformer -> PDF

or:

  html -> PDFTransformer -> PDF

Furthermore, I could convert HTML to text using a STXTransformer, 
without ever using STX at all!

The values in the dictionaries are weightings to allow you to chain 
together the most efficient set of transformers.

Any thoughts, additions, problems?

seb