From gari at eibar.org Fri Jun 7 04:15:26 2002 From: gari at eibar.org (Garikoitz Araolaza) Date: Sun Aug 10 16:54:51 2008 Subject: [Zope-xml] Cataloging nodes Message-ID: <5.0.2.1.0.20020607101136.00a3c7f0@212.46.120.81> Hi, How could I catalog nodes from ParsedXML objects so that I can perform searches returning exactly the node I'm searching for? (It can be done? ) I'm talking about 800 KB ParsedXML objects, and I need a nice search interface... Thanks Gari _________________________________________ Garikoitz Araolaza gari@eibar.org From Yves.Bastide at irisa.fr Tue Jun 25 12:10:30 2002 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sun Aug 10 16:54:51 2008 Subject: [Zope-xml] Creating several ParsedXML from a string Message-ID: <3D1895F6.2090409@irisa.fr> Hi, Given one XML document, I need to create several ParsedXMLs. How to do this? I tried to hack by: * creating a DOM tree * for each subtree of interest, - [...].manage_addProduct['ParsedXML'].manage_addParsedXML(subtree id) - [...][node id].initFromDOMDocument(subtree) This doesn't work, nonwithstanding the fact that initFromDOMDocument is a private method. The DOM tree is created with minidom.parseString; perhaps this would be better using ExpatBuilder (but then, I've not found all the classes and modules to allow abusively :) What's a proper way to do this? Thanks, Yves From Yves.Bastide at irisa.fr Thu Jun 27 03:52:27 2002 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sun Aug 10 16:54:51 2008 Subject: [Zope-xml] Re: Creating several ParsedXML from a string References: <20020626160005.26320.99553.Mailman@mail.python.org> <200206261736.g5QHaJ302573@bkho16tzy31sl.bc.hsia.telus.net> Message-ID: <3D1AC43B.9070204@irisa.fr> John Maxwell wrote: >>From: Yves Bastide >>To: zope-xml@zope.org >>Subject: [Zope-xml] Creating several ParsedXML from a string >> >>Hi, >> >>Given one XML document, I need to create several ParsedXMLs. How to >>do this? > > > > Here's how I do this, using ParsedXML only: > > > # First, bring in the XML and store it as a temporary object > # > self.manage_addProduct['ParsedXML'].manage_addParsedXML( > id='TempImport', > contentType="text/xml", > useNamespaces=0, > file=file ) > > # Next, pull it apart and make new entries out of the > # significant nodes 'foo', 'bar', and 'baz' > # > c = 0 # counter > root = self.TempImport.documentElement > for thisName in ['foo', 'bar', 'baz']: > for myNode in root.getElementsByTagName(thisName): > > # serial number the new object IDs (NID is a counter): > newId = 'z' + string.zfill(str(self.NID + 1), 5) + '.xml' > myTitle = myNode.composeTitleMethod() > > # create the object: > self.manage_addProduct['ParsedXML'].manage_addParsedXML( > id=newId, > title=myTitle, > useNamespaces=0, > contentType="text/xml", > file=str(myNode) ) > self.NID = self.NID + 1 > c = c + 1 > > # and clean up after > try: > self.manage_delObjects('TempImport') > except: > return "couldn't delete the temp object" > > # report back > return "Imported " + str(c) + " new entries." > > Thanks. That's just what I tried to avoid, 2 conversions to DOM and a stringification :-) For now I've changed the external method which retrieve the XML to return distinct entries; but when this starts to require more than basic parsing, I'll come to your solution. > > > ----------------------------- > - John Maxwell > jmax@portal.ca > Regards, Yves From kra at monkey.org Thu Jun 27 13:58:29 2002 From: kra at monkey.org (Karl Anderson) Date: Sun Aug 10 16:54:51 2008 Subject: [Zope-xml] Re: Creating several ParsedXML from a string In-Reply-To: Yves Bastide's message of "Thu, 27 Jun 2002 09:52:27 +0200" References: <20020626160005.26320.99553.Mailman@mail.python.org> <200206261736.g5QHaJ302573@bkho16tzy31sl.bc.hsia.telus.net> <3D1AC43B.9070204@irisa.fr> Message-ID: Yves Bastide writes: > Thanks. That's just what I tried to avoid, 2 conversions to DOM and a > stringification :-) Create one large DOM instance with the source, and several empty instances for the destinations. Use the DOM methods intended to import nodes from other documents under the root node (import* IIRC). If the implementation isn't wacked, you'll only be parsing once and never stringifying. Or use a sax parser to find the string locations of the start and end nodes of the subtree of interest, and feed those substrings to the parser. You have an extra sax parse, but only one if they're disjoint. Or, IIRC, the method to create the document on init was lenient & only needs a node to start on - if your subtrees are disjoint, try feeding them to that. But you might have to do some backflips. -- Karl Anderson kra@monkey.org http://www.monkey.org/~kra/ From Yves.Bastide at irisa.fr Fri Jun 28 09:39:46 2002 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sun Aug 10 16:54:51 2008 Subject: [Zope-xml] Re: Creating several ParsedXML from a string References: <20020626160005.26320.99553.Mailman@mail.python.org> <200206261736.g5QHaJ302573@bkho16tzy31sl.bc.hsia.telus.net> <3D1AC43B.9070204@irisa.fr> Message-ID: <3D1C6722.8020408@irisa.fr> Karl Anderson wrote: > Yves Bastide writes: > > >>Thanks. That's just what I tried to avoid, 2 conversions to DOM and a >>stringification :-) > > > Create one large DOM instance with the source, and several empty > instances for the destinations. Use the DOM methods intended to > import nodes from other documents under the root node (import* IIRC). > If the implementation isn't wacked, you'll only be parsing once and > never stringifying. > > Or use a sax parser to find the string locations of the start and end > nodes of the subtree of interest, and feed those substrings to the > parser. You have an extra sax parse, but only one if they're > disjoint. > > Or, IIRC, the method to create the document on init was lenient & only > needs a node to start on - if your subtrees are disjoint, try feeding > them to that. But you might have to do some backflips. > Hmm. Yes, using importNode() feels like the best way. Something like temp=parse(get-the-collection) for doc-root in temp.get-doc-roots(): id=doc.get-the-id() doc-xml=manage_addParsedXML(id) doc-xml.importNode(doc-root, ...) (pseudo Zope code :) I'll use this approach when the current one's no more sufficient. (FWIW, "the current one" is parsing with string.find(collection, '') / string.find(collection, '') -- it works, so...) Thanks! Yves