[Zope] Folder with one million Documents?

Bjorn Stabell bjorn@exoweb.net
Mon, 28 Jan 2002 16:02:34 +0800


Joachim,

We've tried storing tens of thousands of objects in Zope+ZODB, and it
breaks down pretty badly; ZServer eats a lot of memory and the ZCatalog
is just butt slow.  We converted these sites to using Zope+MySQL with
good results, however.

The main pain in the neck is that ZServer eats a lot of memory for
bigger websites (>150MB), so we can really only run a couple of websites
on each server unless we give it many many GB of RAM (which is expensive
for dedicated hosts).

I recommend a SQL database if you have more than ten thousand objects
and you want to search them.  A filesystem is fine if you never need to
index them.

I'm not sure if this is a gripe against OODBMS in general or just
Zope/ZODB, but it seems RDBMS like MySQL have been much more optimized
towards handling bigger data sets, in all aspects.  I actually think
there should be a published "guidelines" saying when not to use ZODB to
prevent people from designing themselves into a corner with solution
that doesn't scale.

Regards,
--=20
Bjorn


-----Original Message-----
From: Casey Duncan [mailto:casey_duncan@yahoo.com]=20
Posted At: Monday, January 28, 2002 14:21
Posted To: Zope List
Conversation: [Zope] Folder with one million Documents?
Subject: Re: [Zope] Folder with one million Documents?


--- Joachim Werner <joe@iuveno-net.de> wrote:
> Hi!
>=20
> Just my 2 eurocents:
>=20
> > I am developing a simple DMS. Up to now I use a
> python product with a
> > BTreeFolder which
> > contains all the documents. Every document gets an
> ID with
> > DateTime().millis(). There will
> > be up to 50 users working at the same time. And in
> the end I will have
> > up to 3 million documents.
> >
> > Is there a better class than BTreeFolder for such
> mass storage?
>=20
> If it is mainly large documents (like MS Office or
> PDF files) you are trying
> to manage, the fastest way of handling this is using
> the filesystem for
> storage and serving. You could do the cataloging in
> Zope and hold link
> objects to the actual files in a Zope tree (and yes,
> if it is MANY objects,
> BTrees will be a good idea). These links could also
> manage the metadata.

I thoroughly agree. Having developed a DMS myself, My
cut-off point (which is really just an engineering
intuition more than anything) was at about 5000
documents, it would be best to store them directly in
the file system.

Now, since the DMS I developed (DocumentLibrary) was
for a target of < 5000 documents, I went for the
simpler route of storing them in a BTreeFolder.

What you will have to do to make an effective FS
storage system, is create code that processes uploads
and places them in an arbitrary hierarchy. Obviously
putting 3 million documents in one FS directory will
just plain fail in most FSes and at worst will perform dismally. You'll
need to devise a way for the system to subdivide amongst a shallow
hierarchy of dirs, something like Squid does with its cache directories.

For serving the files you could use Apache, but I
might be tempted to try something simpler, like micro
httpd or tux or something light-weight.

I agree that serving static binaries is not ZServer's
strong suit. I guess that choice will depend on the
frequency and size of downloads.

Another thought might be to store the files in the FS
and proxy them through Zope, like ExtFile does. Then
put Squid in front of Zope to cache them so that they
are only served the first time from Zope. Then you
don't have to worry about what stuff is getting served
from where.

BTW: If you do set up any nifty FS storage solution, I
would be interested in seeing it for future version of DocumentLibrary.

Good Luck!
-Casey



__________________________________________________
Do You Yahoo!?
Great stuff seeking new owners in Yahoo! Auctions!=20
http://auctions.yahoo.com

_______________________________________________
Zope maillist  -  Zope@zope.org
http://lists.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -=20
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope-dev )