[Zope] attribute used to index PDFs?

Andreas Jung lists at andreas-jung.com
Tue Dec 13 01:52:41 EST 2005



--On 12. Dezember 2005 14:54:09 -0500 "Garth B." <garthb at gmail.com> wrote:

> - Digging further in this file, "mimetype" is only defined when
> extract_content() in content.py calls "icc.addBinary(...)".  This only
> happens when the indexed object provides a txng_get() hook (or I
> suppose if an adapter exists).

Exactly. That's the indented behavior.

>That whole block (around lines 81 -
> 93) never gets hit with my PDFs or Word docs during indexing.  When I
> index a large number of PDFs I will get a number of TypeErrors raised
> around line 110 when extract_content() notices that the data isn't a
> [unicode] string.
>

Likely because your implementation does not provide the txng_hook. I 
*strongly* recommended providing an adapter for IIndexableContent. The 
original behavior of TXNG 2.X to provide binary content content through an 
attribute or a method (which is the default behavior of almost index 
implementations) is no longer supported in 3.X because it just sucks.
So either use txng_get() (which is deprecated for 3.X) or implemented the 
IIndexableContent API. That's the way to go.

-aj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zope/attachments/20051213/6838d621/attachment.bin


More information about the Zope mailing list