[Zope-dev] Re: restructuredtext

Michel Pelletier michel@digicool.com
Thu, 21 Jun 2001 08:13:49 -0700 (PDT)


I've cc:ed zope-dev in case anyone else is interested.

On Thu, 21 Jun 2001, David Goodger wrote:

> The last time I downloaded and studied the CVS branch was in November 2000.
> At the time, the code wasn't very inviting. I just downloaded the CVS branch
> again, using the instructions in
> http://dev.zope.org/Members/jim/StructuredTextWiki/NGReleases, and visited
> http://dev.zope.org//Members/jim/StructuredTextWiki. The RecentChanges page
> and the StructuredTextNT/CurrentStatus page both list several additions
> which don't seem to be reflected in STNG.txt or the StructuredText.py module
> docstring. Is this code up to date? If not, I would appreciate a pointer to
> the latest code, or a source .tgz by email.

The version in Zope CVS (the HEAD, not any branches) is the most current
code.  The branch was dinscontinued when we folded it into the head, so
try checking out a full head branch and look there, you will probably
notice many changes.

> If you do have information on creating a language front-end parser for
> STXNG, please send it. And please add links to my projects to the
> appropriate Wiki pages if you have time.

(Karl: everytime I say DOM, I mean "DOM-like").

Basicly, Zope's current "classic" front-end is defined in
StructuredText/ST.py.  In there, you will see a function called
StructuredText that turns indented, newline separated text into
StructuredTextParagraph DOM objects, that get inserted DOM tree-style into
a StructuredTextDocument DOM object.

I would suggest you start looking there, ie, turn your code into a simple,
homogeonous StructuredText paragraphs.  For example::

  >>> foo = """Title
  ...
  ...   One
  ...
  ...   Two *three*
  ...
  ...     o four
  ...
  ...     o **five**
  ...
  ... """

  >>> Basic(foo)
  StructuredTextDocument([
  StructuredTextParagraph(Title, [
   StructuredTextParagraph(  One, [
   ])
   StructuredTextParagraph(  Two *three*, [
     StructuredTextParagraph(    o four, [
     ])
     StructuredTextParagraph(    o **five**, [
     ])
   ])
  ]),
  ])
  >>>

The resultant object is a StructuredTextDocument DOM object with
StructuredTextParagraph DOM children.  After the very first step, you
can work with the content using the DOM interface::

  >>> Basic(foo).getNodeName()
  'StructuredTextDocument'
  >>>

Notice how the first pass *didn't* go through looking for markup,
*just* for structure.  We did this for simplicity and because our
structure could be factored out of our markup, I'm not sure if you can
do that, but I suspect you can define more complex rules to define
what a 'paragraph' element is.

The next step is to "colorize" the simple, homogeonous DOM into a more
complex "Document" DOM object::

  >>> Document(Basic(foo))
  StructuredTextDocument([
  StructuredTextSection(StructuredTextSectionTitle(Title, [
  ]), [
   StructuredTextParagraph(  One, [
   ])
   StructuredTextSection(StructuredTextSectionTitle(['  Two ',
  StructuredTextEmphasis('three')], [
  ]), [
     StructuredTextBullet(four, [
     ])
     StructuredTextBullet(StructuredTextStrong('five'), [
     ])
   ])
  ]),
  ])
  >>>

Now you have a DOM object that fully expresses your textual language.
Obviously, you could turn this right back into STX.

The Document() factory accepts a "simple" STX DOM tree created by the
Basic() factory.  This factory goes through, using the DOM API,
looking for our special markup, and then colorizing that markup by
adding new, more specialized DOM objects (like StructuredTextBullet
and StructuredTextEmphasis).

The final step is to feed the colorized DOM into an output generator.
This is a factory that accepts a DOM object and returns a string of
that object in a certain format.  As an example, STXNG comes with
HTML, MML (framemaker) and DocBook generators::

>>> HTML(Document(Basic(foo)))
'<html>\n<head>\n<title>Title</title>\n</head>\n<body>\n<h0>Title</h0>\n<p>
One</p>\n<h1> Two
<em>three</em></h1>\n\n<ul>\n<li>four</li>\n<li><strong>five</strong></li>\n\n</ul>\n</body>\n</html>\n'
>>>

So what you would do is to create your own "frontend" that can turn
your reStructuredText into a simple DOM consisting solely of
'paragraphs', whatever that means to you.  In the case of STX,
indentation and newlines define paragraph structure.  In reSTX, you
may have different ways of marking up document structure.  I suspect
your parsing rules will be more complex, and that you've probably
already written that piece.

DocumentClass.py contain a class for each type of markup STX defines.
All of these classes subclass the StructuredTextParagraph DOM object.
These DOM objects will get created when the
DocumentClass.DocumentClass class encounters your markup as it parses
your Basic DOM.

The DocumentClass.DocumentClass class has doc_* methods that get
called on every paragraph node in your Basic DOM.  Each method has an
associated regular expression that is used to match occorances of your
markup.  Some of them, like doc_table, are very complex, but others
like doc_emphasize are pretty simple.  You would write one subclass of
StructuredTextParagraph and one of these methods for each kind of your
markup.

So you can subclass DocumentClass to specialize and extend it to
recognize the markup in your simple DOM (and reuse any coincidental
features with StructuredText).  The Document class is very simple, but
you will need some understanding of regular expressions to customize
it.

I suspect that using STXNG, just for its DOM, will save you many, many
hours of time that you would need to re-create something as flexible
(the DOM is, after all, a very rich API).  Since STXNG comes in three
distinct peices, you can thrown away the first piece and replace it
with your own, specialize the second piece, and totally re-use the
third.

Hope that helps,

-Michel