[Zope-dev] __record_schema__ of Brains (Was: Record.pyd)

Johan Carlsson [Torped] johanc@torped.se
Sun, 11 Aug 2002 12:20:22 +0200


At 21:28 2002-08-10 -0400, Casey Duncan said:
>On Saturday 10 August 2002 11:25 am, Johan Carlsson [Torped] wrote:
> > Now that I understand how the data tuples are copied to the brain
> > I'm not at all sure adding a filter when copying the tuple will optimize
> > thing, because of the overhead in the filter process.
>
>This occurs lazily so the savings would be heavily dependant on the
>application. For most web apps presenting small batches of records, the
>savings in limiting columns returned would be pretty minimal.

But there must be some though implementing Record.pyd i C, but off course
I suppose Record.pyd was first used for ZSQL?

An easy filter would be to let __record_schema__ control which columns to
save, as it works to day __record_schema__  must point on a sequence=
 starting
with 0, so I can't specify indexes into the tuple like this:

__record_schema__=3D {'hey':12, 'dude': 22}

Maybe this is "easy" to change in the record.pyd, or I just implement it
in a special brain base class?

After revisited Record.c I realized that the tuple from the catalogs=
 self.data
is stored as a tuple (or as a C-array I suppose?) in a Record or as=
 attributes
depending on what you provide to the constructor.
I suppose coping data to a C-array is much faster than creating
attributes on each brain, but if the array is large and the number=
 attributes
needed to be set is small it might be the other way around.
I have no idea where they would break even.

Maybe I just will settle with having two different brain base classes and=
 use
one that suits the current need.

>The general usage is to put a minimal set of columns in metadata, only=
 enough
>to create a results page and load the objects in cases where either large,
>dynamic or otherwise arbitrary data elements are needed.

Yes, and that is somewhat restricting.
My current applications use several different catalogs to get the
width of the meta_data down. The downside of this approach  is
that I end up with allot of catalogs and that it's a multitude time more
things to do for management, e.g. I must reindex all catalogs instead
of just one.

My primary goals are:
1. Get a general ZCatalog that can be used for all ZCatalog requirement=20
(not only site searches),
2. Implement feature that removes the need for external RDBS (for instance
report generation is hard with ZCatalogs because of the lack of=20
grouping/statistics).
3. Make ZCatalogs easier to manage, for instance the need of updating=20
indexes and meta_data
definitions every time you change your applications data structure is=20
annoying, especially at
development time. Objects could tell the ZCatalog which meta_data and=20
indexes it wants removing
the need to manually add them. Off course you will need to clean up the=20
ZCatalog from time to time.


> > (The way that I "solved" the group/calc part of my "project", I don't=
 think
> > it will lead to memory bloat. I'm going to implement a LacyGroupMap
> > which take an extra parameter (a list of IISet). Each brain created
> > in the LacyMap will have methods for calculations directly on the=
 self.data
> > in the Catalog. The data it self will not be stored.
> > There will most probably be a pre calculate method that calculate all
> > variables that are applicable and caches the result.)
>
>Sounds like a pretty good solution. However, I would be hesitant in=
 creating
>direct dependancies on the internal Catalog data structures if you can help
>it (sometimes you can't though).

I could "soften" the dependency by providing the catalog with an interface=
 for
calculations and give the brain an reference to the catalog it self and
use the interface on that reference.


> > One way to reduce memory consumption in wide Catalogs would be
> > to have LacyBrains (vertical lacyness, there might be reasons
> > why that would be a bad idea, which I'm not aware of)
>
>That would pretty much require a rewrite of the Catalog as the data=20
>structures
>would need to be completely different. It would introduce significant
>database overhead since each metadata field would need to be loaded
>individually. I think that would negate whatever performance benefit=
 metadata
>might have over simply loading the objects.

I'm not sure that it would be necessary to change the data structure, the=20
brain could
use the same method as the LacyMap uses to load the data.
But LacyBrain would need to save all applicable data at once to be=
 efficient.
The different would be that the brain will not fetch any data before the=
 first
attribute has been called. When the first is called all applicable data will
be copied to the attribute according to __record_schema__.

This would probably not be more efficient for regular use of brains, but for
calculated group brains they wouldn't need to store the data at all if
they only used calculated fields.


> > Another way would be to have multiple data attributes in the Catalog,=
 like
> > tables, and to join the tuples from them with a "from table1, table2"
> > statement.
> > In this way it would be possible to control the width of the brains.
> > It would also be possible for the object indexing it self to tell the
>catalog
> > in which "tables" it should store meta data.
>
>Yes, this would be better. You could have different sets of metadata for=
 each
>catalog record. You would select which one you wanted at query time.

Yeah I like it as well. It would also require a more SQL-like query=
 interface.

>
> > There have been some proposals (ObjectHub et al) which I read some
> > time ago. I didn't feel then that we what I was looking for.
> > Please tell me if there's been any proposals or discussions regarding=
 this.
>
>I don't think so. If you feel strongly about this, write up a proposal and
>provide some use cases for discussion.

Yes, but first implementation :-)
I'm very XP in that aspect. I find code easier to communicate when=20
specifications :-)
Or at least Python Code, I don't C-code easier to communicate.

Cheers,
Johan Carlsson



--=20
Torped Strategi och Kommunikation AB
Johan Carlsson
johanc@torped.se

Mail:
Birkagatan 9
SE-113 36  Stockholm
Sweden

Visit:
V=E4stmannagatan 67, Stockholm, Sweden

Phone +46-(0)8-32 31 23
Fax +46-(0)8-32 31 83
Mobil +46-(0)70-558 25 24
http://www.torped.se
http://www.easypublisher.com