[Zope] wget of a zope site

Paul Winkler pw_lists at slinkp.com
Sun Feb 6 15:31:56 EST 2005


On Sun, Feb 06, 2005 at 05:15:50PM +0100, Roger Oberholtzer wrote:
> That is not what I am doing. The site is currently a happy dynamic Zope
> site. It is just that the site owners want to move elsewhere and no
> longer want the Zope site. But they want the existing content to put in
> their new static boring site. This my use of wget.

It should "just work".  Having no knowledge of how your Zope
site is put together, I for one have no idea what could be wrong.

wget has a lot of options that are worth exploring. 
For producing a locally browsable static
copy of zope and CMF content, I eventually settled on this,
which changes some file extensions and rewrites links to point
to the local version:

wget -nc -r -l8 -p -nH --no-parent --convert-links --html-extension

It's not perfect, as for a folder named "foo" you may end up with both
foo.html and foo/index_html.html, both having the same content.

It also helps if you don't have runaway URLs:
i.e. relative links in your navigation that lead to wget traversing
the same object over and over with URLs like
http://foo/bar/baz/baz/baz/baz/baz/baz/...

 
> Another interesting thing about using wget with the Zope site is what
> happens if you have a calendar a ?a Plone. The links to each year are
> followed on and on. And, as each year is at the same level in the
> hierarchy, the level limiting for wget has no effect. What happens is
> that wget can run forever, following the years in the calendar.

Maybe some work on robots.txt could help with this?
Don't know.
 
-- 

Paul Winkler
http://www.slinkp.com


More information about the Zope mailing list