[ZODB-Dev] Finding out whether an object is in the ZEO or ZODB cache.

Toby Dickenson tdickenson@geminidataloggers.com
Wed, 21 Aug 2002 14:25:20 +0100


On Wednesday 21 Aug 2002 12:48 pm, Arnar Lundesgaard wrote:

>  we have been trying to set up a ZEO cluster using Squid for load
> balancing with ICP as described in
>
>    http://www.zope.org/Members/htrd/icp/intro.
>
>
> Under the heading 'When does Zope say ICP_HIT', there are listed
> a few possibilities for optimizations. On that page Toby Dickenson
> mentions
>
> """
>  - If some methods require a large persistent object that will take a
>    long time to transfer from the ZEO server, you might say ICP_HIT if
>    that object is already in the ZEO cache.
>
>  - If some methods require an object that will take a long time to
>    unpickle, you might want to say ICP_HIT if that object is already in
>    the ZODB cache.
> """
>
> The question is: "How can we know if the object is in the ZEO or ZODB
> cache when all we have from the ICP request is the URL?"



> As far as we can tell the cache only stores an OID. However, the lower
> layers of ZODB are unfortunately still black magic to us.=20

Dont worry, you dont need to go there.

> We where
> hoping that there is a way to find this out without having to traverse
> the path and wake the object we want to test.

Yes, you definitely dont want to be doing url/object traversal during icp=
=20
processing. It will kill scalability, add significant latency, and reduce=
=20
ZODB cache performance.

I wrote the ICP support (and that HOWTO) just before rewriting the ZODB=20
caching system. At the time I was hoping that a cunning solution to this=20
problem would emerge. Unfortunately it didnt, so we are stuck using crude=
=20
hacks :-(


My current implementation hack looks like this.....

I have a module-level registry which contains the URL of big objects curr=
ently=20
in memory. My big objects add their url into this cache at the point wher=
e=20
they are accessed. Possibly you dont want to store URL fragments rather t=
han=20
whole URLs..

My ICP handler picks apart the URL by hand, using string method or regula=
r=20
expressions, and checks whether the appropriate URL fragments are in the=20
registry. It return ICP_HIT if they are.

It is possible to confuse this scheme by giving it carefully crafted funn=
y=20
URLs.... That doesnt matter - Its only an optimisation.

You also need to remove the url fragments from the registry when the obje=
ct is=20
deactivated. The best way to do this today is create a class whose __del_=
_=20
performs the unregistration, and add an instance of this class as a=20
_v_attribute of the persistent object.

An extra complication is that persistent objects can be loaded into memor=
y=20
more than once (if you have more than one publisher thread). You may want=
 to=20
use a reference count in this registry.





> I hope this is the appropriate mailinglist for this enquiry.

Sure. feel free to cc me directly too.