[Zope-dev] ZCatalog, import errors, and indexing errors

R. David Murray bitz@bitdance.com
Sun, 5 Mar 2000 13:05:44 -0500 (EST)


Well, I may have been foolhardy.  Based on various comments on this list
I did not think I would have any trouble importing a 60,000 record
database into Zope and ZCataloging it.  Having the records in Zope
as objects made a lot more sense for this project than putting them
into a backend RDBMS.

The first problem I ran into was during the import.  I wrote an
external method that read the data from a tab delimited file
and used the data to build ZClass instances.  I tried to create
them all in one folder in one transaction.  This failed miserably
at somewhere around 1500 records.  I got an error about an
'frexp' call being out of range, somewhere in ZCatalog's BTree
methods (my apologies for not having captured the exact error; maybe
I can reproduce it once I've finished building my new test system).

So I tried loading the records in batches, and at first that seemed
to work.  Then I got an out of memory error.  After noticing that
trying to view that directory in the management interface threw both
my browser and Zope into fits, I decided to try loading the records
into multiple folders.  This also seemed to work.  I used 1000
record batches.  But occasionally I would get the frexp error.  If
I tried the load several times, it would eventually complete without
error.  Loading other batches in between tries seemed to help, but
that may be an illusion.  My load method ended up with the occasional
short file, and I got this frexp error with batches as short as
300 records.  So I think the error has something to do with the
catalog machinery and not the batch size.

So I finally got the database loaded, and everything seemed to be
working.  However, we have just discovered that certain keywords
do not appear to be yeilding the expected results on searches.
The ZClass is catalog aware, and there are a few fields being indexed.
The one of concern is just called 'keywords'.  Despite its name it
is a text index, so that we can take advantage of the ability to
do 'AND'ed searches.

One more piece of info that may or may not be important.  There are
actually two ZClasses: the one holding the database records, and another
class with a property of the same name (keywords).  Instances of this
second class get added by hand.

Now we find that when we enter certain keywords (the examples we have
found so far are 'well' and 'fire', which you probably don't need
to know) on one of these by-hand ZClass instences, they are *not* found
by a ZCatalog search on the keywords field index.  Other words
entered into the keywords field do cause the record to be found
('wells', for example).

Now, I'm guessing no one is going to have an answer for me.  What I'm
hoping for is some tips for how to go about debugging this.  Actually,
what I'm really hoping is that someone from DC will view this as
an important enough bug that they'll ask for a login on my system
to check it out through <grin>.  I have a gut level feeling that
the frexp error and the index failure are related, so I also may
try a total reload of the database, if someone can answer question
(2) below.

To sumarize, this experience has raised several questions in my
mind:

1) what is the practical limit on the number of entries in a folder,
	and is there some way to get around this for instances where
	you want to use the ZODB as a database for a large number
	of records?
2) is there a practical limit on the number of changes that can be
	part of a transaction (eg: *should* I be able to add
	60K objects in one transaction?)
3) what is the best way to do a massive data load?
4) how does one go about debugging ZCatalog?  (I've read the debugger
	doc posted here, and I'll see if that is enough to allow me to
	get started on this)

--RDM