[ZODB-Dev] Sharing (persisted) strings between threads

Dylan Jay djay at pretaweb.com
Wed Dec 8 16:26:25 EST 2010


Dylan Jay
Technical solution manager
PretaWeb 99552830

On 08/12/2010, at 11:28 PM, Jim Fulton <jim at zope.com> wrote:

> On Wed, Dec 8, 2010 at 5:06 AM, Malthe Borch <mborch at gmail.com> wrote:
>> Currently, when a thread loads a non-ghost into its object cache, its
>> straight from being unpickled. That means that if two threads load the
>> exact same object, any (immutable) string contained in the object
>> state will be allocated for in duplicate (or in general, on the count
>> of the active threads).
>>
>> If instead, all unpickled strings were made canonical via a weak
>> dictionary, there would be only one copy in memory, no matter the
>> thread count, e.g.:
>>
>>  string = weak_string_map.setdefault(string, string)
>>
>> If the returned string was a different (canonical) copy, the duplicate
>> would immediately be ready for garbage collection.
>>
>> This is a real win in memory savings. Using Plone, I experimented with
>> the approach by using the Python pickle implementation and interning
>> all byte strings (using ``intern``) directly in the unpickle routine
>> to the same effect:
>>
>>    def load_binstring(self):
>>        len = mloads('i' + self.read(4))
>>        string = self.read(len)
>>        interned = intern(string)    # (sic)
>>        self.append(interned)
>>
>> With 20 active threads, each having rendered the Plone 4 front page,
>> this approach reduced the memory usage with 70 MB.
>
> Out of a total of what?
>
> Note that if a process is CPU bound (as most dynamic Python apps
> should be), then there is little or no benefit in having multiple
> threads, due to the (damn) GIL.

I was working with a high volume site recently where 70% of requests
called a back end API that could take up to 4sec. For better or worse
this was in zope. The best solution to scaling this was to increase
the number of threads since this process was now IO bound. You do run
out of memory when you do this so this solution would have been
helpful. If a shared cache between processes were possible, such as
using memcached, that would be even better :)

>
> If your app only renders pages based on data read from a ZODB, and
> it's not CPU bound with a single thread, then your database config is
> probably wrong.
>
>> Note that unicode
>> strings aren't internable (but the alternative technique of using a
>> weak mapping should work fine).
>
> Except that you can't create wekrefs to strings or unicode.
>
> Also, while interning is fine for an experiment, it's wasteful for
> strings that are rarely needed.
>
> Sharing immutable data between threads is very appealing
> intellectually. I've certainly thoughtr about it a lot. In practice,
> I doubt the benefit will be worth the extra overhead (let alond the
> effort :).
>
> Jim
>
> --
> Jim Fulton
> _______________________________________________
> For more information about ZODB, see the ZODB Wiki:
> http://www.zope.org/Wikis/ZODB/
>
> ZODB-Dev mailing list  -  ZODB-Dev at zope.org
> https://mail.zope.org/mailman/listinfo/zodb-dev


More information about the ZODB-Dev mailing list