[ZODB-Dev] ZODB for spambayes server-side filter?
Simone Piunno
pioppo at ferrara.linux.it
Mon Jan 12 09:44:43 EST 2004
Hello,
I'm working on a server-side spam filter based on spambayes.
After some prototyping with BDB4, I've started to look at ZODB.
I'm trying to understand if this is a good idea.
The project design is the following:
- implemented as an SMTP proxy daemon
- we'll keep a server-wide Word Probability Database, shared among
all users, or several of them, but we'll also have a per-user WPD. Users
will be able to choose among them, and even completely disable the
filter for their address.
- the filter will get all the traffic, but for each email it can choose the
right WPD based on the RCPT address (envelope receiver).
- training will be basically done by email mime forward. In case the filter
is not sure about the trainer's identity (e.g. because no password based
authentication scheme was in place), the filter could send a request to
the trainer address, with a cookie in the subject, asking for reply to
confirm identity (much like Mailman does for subscriptions).
- all messages can be temporarily retained in a cache, so that training
can be performed on the pristine copy instead of the forwarded one. Old
entries in the cache would be automatically expired.
- users can choose to receive all the traffic, simply tagged, or they can
choose to block spam and/or unsures. They will receive a daily report
on blocked email, so that just skimming at the from/subject list in the
report they could decide if a correction is requested. Blocked email could
be unblocked and/or trained manually through the web, if you do it before
automatic expiration timeout.
- configuration will be done through the web.
- accurate statistics will be kept per-user and server-wide.
I believe simple BDB is too flat to persist such a complex data structure,
therefore I've started looking at ZODB. I'm fairly conviced that a
transactional storage is required here and it will be mostly read only:
writes will be only for training, stats update and configuration.
After some benchmark, I got a 5-10x performance increase.
One main question is: how to avoid collision collapse? I think at 1st approx
in case of transaction collision I can safely abort the SMTP connection and
wait for retry, but how can I be sure that more retries won't accumulate
collapsing the database?
TIA
Simone
--
This signature intentionally left blank
More information about the ZODB-Dev
mailing list