[ZODB-Dev] Sharing (persisted) strings between threads

Wed Dec 8 09:29:43 EST 2010

On Wed, Dec 8, 2010 at 7:45 AM, Malthe Borch <mborch at gmail.com> wrote:
> On 8 December 2010 13:28, Jim Fulton <jim at zope.com> wrote:
>>> With 20 active threads, each having rendered the Plone 4 front page,
>>> this approach reduced the memory usage with 70 MB.
>>
>> Out of a total of what?
>
> In my case out of 430 MB non-shared for the process.
>
>> Note that if a process is CPU bound (as most dynamic Python apps
>> should be), then there is little or no benefit in having multiple
>> threads, due to the (damn) GIL.
>
> The case I'm thinking of is when one thread is being used in a write
> transaction, while another is doing a read.

I doubt that write transactions block enough to make a difference.

>
> If the database is bigger than the allowed memory usage, then I guess
> threads can also ensure that requests for in-memory objects can be
> served while some threads are blocked due to swapping and/or reading
> pickles from disk.

It's not the database size that matters, but the working set.  We have
an application that is somewhat pathological in that it's working set
is much larger than the amount of memory it's given and yet we're
still substantially CPU bound.  Data can be loaded from a ZEO cache
pretty quickly.

As Hanno said, the recommendation for a single thread assumes that you
have multiple processors.

>
>> Except that you can't create wekrefs to strings or unicode.
>
> I see. Maybe another scheme could be devised.

Yeah, maybe.  For example, you could subclass string or unicode.  This
will add significant per-string overhead that could swamp the benefits
you hope to achieve.

>
>> Also, while interning is fine for an experiment, it's wasteful for
>> strings that are rarely needed.
>
> How so? As far as I can see, interning is still subjected to reference
> counting. The only real difference is that a hash table is maintained
> (fairly minimal memory use + probable computation of string hash).

The hash table retains a reference to the strings in it.  The
references aren't weak afaik.

>
>> Sharing immutable data between threads is very appealing
>> intellectually. I've certainly thoughtr about it a lot. In practice,
>> I doubt the benefit will be worth the extra overhead (let alond the
>> effort :).
>
> I think if the case can be made for threading, then it's worth
> pursuing.

Knock yourself out. :)

> Alternatively, applications might put all non-trivial
> strings into blobs, but I don't know if there's a non-trivial overhead
> with that approach.

What are you thinking of as applications with non-trivial strings?

The only one I can think of is template source.  That might be better
served by either storing the source compressed or even storing it in a separate
object that doesn't need to be in memory except when editing or compiling.

Jim

-- 
Jim Fulton