[Zope3-Users] Best structure for a large schema

David Pratt fairwinds at eastlink.ca
Wed Apr 12 11:44:08 EDT 2006


I realize this question my be a bit off topic but I am trying to 
determine the best structure for storing a large schema where some 
attributes are lists or dictionaries. There are about 70 attributes in 
the schema and I am trying to choose a structure that will not 
necessarily have to hold a bunch of empty space. I have a base schema of 
approximately 30 attributes and other that subclass from it with the 
largest being about 70.

I had originally thought of RDF at the onset and built a datastore with 
rdflib using a relational database. I chose this option because when 
ZODB gets larger it takes plenty of RAM. Problem here is the number of 
accesses to gather a complete object. It is pretty efficient from a 
storage perspective since if an object does not have particular 
attributes, you are not storing them and all items in the store are 
unique. I was not duplicating one piece of data. But say you wanted to 
present a page consisting of 20 items or do a search. Gathering this up 
takes much time when you are hitting a disk how many times to gather up 
just a single object so query times were unacceptable and loading rdf 
from outside sources also took a very long time. The data store grows 
into millions of records so you better have a pretty sweet rdb server 
with lots of RAM also.

I had dismissed a relational database on its own since the data does not 
  lend to a row and I may want to add to the schema at some point in 
time which could mean some pretty ugly business this way. But then I saw 
the vertical example in the examples folder of SQLAlchemy that can do 
something to create dynamic fields as necessary to potentially avoid 
this kind of hassle.

The ZODB provides the flexibility and Generations could work well for 
future updates so this looks very good but how efficient is it if 15% of 
the attributes have data and 85% do not?

I have also been experimenting with hybrid pickle / rdb storage so that 
the attributes that will receive the most attention are stored as fields 
and the full record is stored as pickle that is unpickled for views and 
data entry.

In any case. Thought I'd ask since I am concerned about the efficiency 
of storage and speed of access both. If rdf access was fast then it 
would be great but this had not been the case. I just thought there may 
be some other ideas on this or someone could advise on the efficiency of 
ZODB when in some cases, uses will be selective about which attributes 
are important to them.

Regards,
David





More information about the Zope3-users mailing list