-
Notifications
You must be signed in to change notification settings - Fork 20
/
faq.xml
535 lines (486 loc) · 28.5 KB
/
faq.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
<?xml version="1.0"?>
<!-- $Id: /xmltwig/trunk/faq.xml 33 2008-04-30T08:03:41.004487Z mrodrigu $ -->
<faq>
<header>
<title>XML::Twig FAQ</title>
<version>1.12</version><date>2006-02-07</date>
<author>Michel Rodriguez</author>
</header>
<credits>
<p>FAQ created by Michel Rodriguez</p>
<p>Thanks to the numerous users of XML::Twig for their questions and suggestions, and to Walter Pienciak for letting me
mirror this FAQ on the IEEE website</p>
</credits>
<overview><p>This FAQ contains information on XML::Twig, a perl module used to process XML documents.
Please direct all corrections and additions to <a href="mailto:[email protected]">[email protected]</a>. </p>
<p>This FAQ can be found on the Web at <a href="http://www.xmltwig.org/xmltwig/faq.html">
www.xmltwig.org/xmltwig/faq.html</a>.</p>
<p><a name="mlist"></a>Information in this FAQ is based mainly on question to the Perl XML email list. To join, send an email to <a href="mailto:[email protected]">[email protected]</a> with the message:
<b>SUBSCRIBE Perl-XML</b>.</p>
<p>This FAQ was generated using a Perl script (using XML::Twig ;--) and an XML file. The script is at <a href="http://xmltwig.org/xmltwig/twig_faq">xmltwig.org/xmltwig/twig_faq</a>. The XML source is at
<a href="http://www.xmltwig.org/xmltwig/faq.xml">http://www.xmltwig.org/xmltwig/faq.xml</a>. To generate the XML::Twig FAQ, run <B>twig_faq faq.xml</B> which prints the HTML to STDOUT.
</p>
</overview>
<q id="1">
<question>I know what a twig is but what is that XML thing anyway?</question>
<answer>OK, time for a quick list of XML links:
<ul><li><a href="http://www.w3.org/XML">The W3C XML page</a></li>
<li><a href="http://xml.coverpages.org/sgml-xml.html">The XML Cover Pages</a></li>
<li><a href="http://perl-xml.sourceforge.net/faq/">The Perl XML FAQ</a></li>
<li><a href="http://www.xml.com/pub/au/83">Kip Hampton's Perl and XML column</a></li>
</ul>
</answer>
</q>
<q id="2">
<question>Where can I get the latest version of XML::Twig?</question>
<answer>The latest stable version:
<ul><li><a href="http://www.cpan.org/modules/by-module/XML/MIROD/">CPAN</a></li>
<li><a href="http://www.xmltwig.org/xmltwig/">The Twig Homepage</a></li>
<li><a href="http://standards.ieee.org/resources/spasystem/twig/index.html">The Twig Homepage (mirror hosted by the IEEE)</a></li>
</ul>
The latest development version:
<ul><li><a href="http://www.xmltwig.org/xmltwig/">The Twig Homepage</a></li>
<li><a href="http://standards.ieee.org/resources/spasystem/twig/index.html">The Twig Homepage (mirror hosted by the IEEE)</a></li>
</ul>
</answer>
</q>
<q id="3">
<question>Where is the documentation?</question>
<answer><p>Development version:
<a href="http://www.xmltwig.org/xmltwig/twig_dev.html">html</a> /
<a href="http://www.xmltwig.org/xmltwig/twig_dev.txt">text</a></p>
<p>Stable version:
<a href="http://www.xmltwig.org/xmltwig/twig_stable.html">html</a> /
<a href="http://www.xmltwig.org/xmltwig/twig_stable.txt">text</a>
</p>
<p>You can also type <tt>perldoc XML::Twig</tt> once you have installed the module
or look at the <a href="http://www.xmltwig.org/xmltwig/quick_ref.html">XML::Twig Quick Reference</a>,
or goto <a href="http://www.xmltwig.org">xmltwig.org</a> for more information, including a
<a href="http://www.xmltwig.org/xmltwig/tutorial/index.html">tutorial</a>.</p>
</answer>
</q>
<q id="9">
<question>How is XML::Twig supported?</question>
<answer><p>Twig is supported through email <a href="mailto:[email protected]">[email protected]</a>
and through the <a href="#mlist">Perl-XML mailing list</a>.</p>
<p>You are encouraged to report bugs using RT at <a href="http://rt.cpan.org">rt.cpan.org</a>.</p>
<p>Please send the following configuration information when you describe a bug:</p>
<ul><li>OS</li>
<li>version of perl (<tt>perl -v</tt>),</li>
<li>version of <tt>expat</tt> (see below),</li>
<li>version of XML::Parser (<tt>perl -MXML::Parser -le'print $XML::Parser::VERSION'</tt>),</li>
<li>version of XML::Twig (<tt>perl -MXML::Twig -le'print $XML::Twig::VERSION'</tt>).</li>
</ul>
<p>Finding the version of <tt>expat</tt> that you are running can be a bit tricky, but it is an
important information. Here is how you can get it:</p>
<p>First, if you are using a version of XML::Parser lower than 2.30, then you don't need to mention
<tt>expat</tt>'s version: XML::Parser comes with
its own version of <tt>expat</tt> (it is old though, you might want to upgrade, first grab
<tt><a href="http://expat.sourceforge.net">expat</a></tt> and install it, then install
a recent version of XML::Parser).</p>
<p>If you are using XML::Parser 2.30 or above, run <tt>xmlwf -v</tt>. If you are lucky this will
give you the version of expat. If <tt>xmlwf</tt> exists but
does not like the <tt>-v</tt> option, then you are most likely running expat 1.95.2. If
<tt>xmlwf</tt> is not installed on your system (which can be the case if you did not install
<tt>expat</tt> yourself but use the one provided with your OS) then (on *nix) you can look for
libexpat.so in your library path (using for example <tt>slocate libexpat.so</tt>).
libexpat.so.1.0 is expat 1.95.2, libexpat.so.3.0 is expat 1.95.4 (in which case you should
upgrade, expat 1.95.4 is not compatible with XML::Twig, libexpat.so.4.0 is expat 1.95.5 or
1.95.6.</p>
<p>This information will help me a lot in figuring out what causes the problem.</p>
</answer>
</q>
<q id="4">
<question>What is XML::Twig used for anyway?</question>
<answer><p>I use XML::Twig for all sorts of XML processing: I use it to extract data from XML documents, to update documents from one DTD to another, to convert them to HTML and to extract/store/process data to and from a various databases.</p></answer>
</q>
<q id="5">
<question>Why should I use XML::Twig?</question>
<answer><p>The main purpose of XML::Twig is to allow you to process XML documents that might be too big to fit in memory (with XML::DOM for example). If you are in that case but don't really like stream oriented processing, then XML::Twig allows you to use a mixed stream/tree model, where you can process sub-documents as trees and then flush them to free the memory.</p><p>In addition it is designed to be easy to use, masking some of the most annoying quirks of XML and XML::Parser, such as whitespace management and encodings (see below)</p><p>The main drawback of XML::Twig is that it is not XML::DOM! It is does not have a standard interface (feel free to add one ;--) nor does it interface with XML::SAX, although as of verion 3.05 it does export SAX streams</p><p>Using the twig_roots option also lets you process (using the tree interface) only the parts of the documents you are interested in, something that can speed up tremendously your scripts</p>
</answer>
</q>
<q id="23">
<question>
What are the alternatives to XML::Twig?
</question>
<answer>
<p>The <a href="http://perl-xml.sourceforge.net/faq/">Perl-XML FAQ</a> lists
quite a few other modules that can be used to process XML.</p>
<p>When deciding which module to choose for any slightly complex processing
of XML, I would advise you to also have a look at
<a href="http://search.cpan.org/dist/XML-LibXML">XML::LibXML</a>. Here is a
quick comparison of the 2 modules.</p>
<p>XML::LibXML, actually <a href="http://xmlsoft.org">libxml2</a>
on which it is based, sticks to the standards,
and implements a good number of them in a rather strict way: XML, XPath, DOM,
RelaxNG, I must be forgetting a couple (XInclude?). It is fast and rather
frugal memory-wise.</p>
<p>XML::Twig is older: when I started writing it XML::Parser/expat was the only
game in town. It implements XML and that's about it (plus a subset of XPath,
and you can use XML::Twig::XPath if you have XML::XPath installed for full
support). It is slower and requires more memory for a full tree than
XML::LibXML. On the plus side (yes, there is a plus side!) it lets you process
a big document in chunks, and thus let you tackle documents that couldn't be
loaded in memory by XML::LibXML, and it offers a lot (and I mean a LOT!) of
higher-level methods, for everything, from adding structure to "low-level" XML,
to shortcuts for XHTML conversions and more. It also DWIMs quite a bit, getting
comments and non-significant whitespaces out of the way but preserving them in
the output for example. As it does not stick to the DOM, is also usually leads
to shorter code than in XML::LibXML.</p>
<p>Beyond the pure features of the 2 modules, XML::LibXML seems to be preferred by
"XML-purists", while XML::Twig seems to be more used by Perl Hackers who have
to deal with XML. As you have noted, XML::Twig also comes with quite a lot of
docs, but I am sure if you ask for help about XML::LibXML here or on Perlmonks
you will get answers.</p>
<p>Note that it is actually quite hard for me to compare the 2 modules: on one hand
I know XML::Twig inside-out and I can get it to do pretty much anything I need
to (or I improve it ;--), while I have a very basic knowledge of XML::LibXML.
So feature-wise, I'd rather use XML::Twig ;--). On the other hand, I am
painfully aware of some of the deficiencies, potential bugs and plain ugly code
that lurk in XML::Twig, even though you are unlikely to be affected by them
(unless for example you need to change the DTD of a document programatically),
while I haven't looked much into XML::LibXML so it still looks shinny and clean
to me.</p>
<p>That said, ifyou need to process a document that is too big to fit memory
and XML::Twig is too slow for you, my reluctant advice would be to use "bare"
XML::Parser. It won't be as easy to use as XML::Twig: basically with XML::Twig
you trade some speed (depending on what you do from a factor 3 to... none)
for ease-of-use, but it will be easier IMHO than using SAX (albeit not
standard), and at this point a LOT faster (see the last test in
<a href="http://www.xmltwig.org/article/simple_benchmark/">simple benchmark</a>).</p>
</answer>
</q>
<q id="6">
<question>My XML documents/data are produced by tools that do not grok Unicode, will XML::Twig help me there?</question>
<answer><p>Yes, if you use the KeepEncoding option when you create a twig all PCDATA (character data) will be returned as-is, don't forget to use an encoding declaration in the XML declaration or in the twig creation though or the parser will die on you. You can also process your document as UTF-8 internally and use the <tt>output_encoding</tt> option (XML::Twig version 3.05 and above) to convert the output to your favourite encoding.</p></answer>
</q>
<q id="7">
<question>What's that whitespace management thing?</question>
<answer><p>XML parsers are required by the standard to pass ALL data outside the markup to the calling application. Most of the time this is not desirable. By default XML::Twig discards those pesky \n (in fact XML::Twig discards all element contents that contain only whitespaces. This can be changed at twig level</p></answer>
</q>
<q id="8">
<question>What's the expansion factor from an XML document to a twig?</question>
<answer><p>If you load the entire document in a twig the expansion factor is about 13 (the 900K file used for the benchmark takes about 11M). Of course if you flush the document as you're parsing then it will be <b>much</b> less!</p></answer>
</q>
<q id="10">
<question>I have that huge XML document, but I only want to extract information from a couple of elements, can XML-Twig help me there?</question>
<answer><p>Oddly enough yes! Create the twig using the TwigRoots option and the tree will be built only for those elements. <br/>Example:<code>
my $twig= XML::Twig->( twig_roots => { info => \&process_info });
</code>
</p></answer>
</q>
<q id="11">
<question>I process lots of XML documents in batch and there seems to
be a memory leak in XML::Twig, any fix for that?</question>
<answer><p>Yes, since version 3.00, XML::Twig has a <tt>dispose</tt> method that releases completely a twig.
With earlier versions you can release it yourself by doing:
<code>
undef $t->{twig};
undef $t->{twig_root}->{twig};
undef $t->{twig_parser};
</code>
</p>
<p>The easiest method though, if you are using perl 5.6.0 and above, is to install the
<a href="http://search.cpan.org/search?dist=WeakRef">WeakRef</a> module, which fixes the memory leak</p>
</answer>
</q>
<q id="12"><question>How can I install XML::Twig on Windows?</question>
<answer><p>XML::Twig might be available as a ppm either from <a href="http://www.activestate.com">Activestate</a>
or from another repository (see <a href="http://aspn.activestate.com//ASPN/Reference/Products/ActivePerl/faq/ActivePerl-faq2.html">Using PPM to install modules</a> for more information about ppm and for a list of repositories.</p>
<p>If it is not available, or if you want to use the development version, you can just uncompress the distribution file (<tt>XML::Twig-x.xx.tar.gz</tt>) and copy the <tt>Twig.pm</tt> in the <tt>C:\Perl\site\lib\xml</tt> directory, alongside <tt>Parser.pm</tt>. Of course if you use <a href="http://cygwin.com">Cygwin</a> you can install the module with the usual<tt>perl Makefile.PL; make; make test; make install</tt> incantation. You might need to download <a href="http://download.microsoft.com/download/vc15/Patch/1.52/W95/EN-US/Nmake15.exe">nmake</a>.</p>
<p>Alternatively <a href="http://cpan.uwinnipeg.ca/module/XML::Twig">KobeSearch</a> lists PPMs for the module</p> </answer>
</q>
<q id="17">
<question>
<p>I am having a problem installing MythTV on RedHat 9.0:</p>
<p>When I attempt to do an install XML::Twig in CPAN It goes through its
install, but then states: <tt>Weak references are not implemented</tt></p>
</question>
<answer>You need to upgrade the <tt>Scalar::Util</tt> module, from CPAN. Then re-run the install
from scratch (doing the <tt>perl Makefile.PL; make; make test; make install</tt> dance, or
cleaning up the CPAN/CPANPLUS cache, I suspect you have to exit the shell and launch it again
for this to work).</answer>
</q>
<q id="16">
<question><p>I seem to be having a spot of trouble getting XML::Twig 3.08 to compile
and install on a SuSE 8.1/RedHat 8.0 system.</p>
<p>Here is the result of <tt>make test</tt>:</p>
<code><![CDATA[make test
[...]
t/test_entities...........
undefined entity at line 4, column 13, byte 77:
<!DOCTYPE doc SYSTEM "t/dummy.dtd">
<doc>
<elt1>toto &ent1;</elt1>
============^
<elt2>tata &ent2;</elt2>
<elt3>tutu &ent3;</elt3>
at /usr/lib/perl5/site_perl/5.8.0/i586-linux-thread-multi/XML/Parser.pm line 185
t/test_entities...........dubious
Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 1-6
Failed 6/6 tests, 0.00% okay
[...]
t/test_spaces.............dubious
Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED tests 1-3
Failed 3/3 tests, 0.00% okay
t/test_twig_roots.........ok t/test_xpath_cond.........ok
Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/test_entities.t 255 65280 6 6 100.00% 1-6
t/test_spaces.t 255 65280 3 3 100.00% 1-3
Failed 2/18 test scripts, 88.89% okay. 9/400 subtests failed, 97.75% okay.
make: *** [test_dynamic] Error 29]]></code>
</question>
<answer><p>The problem is an incompatibility between XML::Twig and the
version of the libexpat library that comes with RH 8.0 / Suse 8.1. (1.95.4)
If you upgrade to XML::Twig 3.08 or later and to the latest version of libexpat you should not
get the problem anymore.</p>
<p>You can get the latest version of libexpat on sourceforge: <a href="http://expat.sourceforge.net/">http://expat.sourceforge.net/</a></p></answer>
</q>
<q id="21">
<question><p>Setting $SIG{__DIE__} breaks parse()</p>
<p>The problem can be narrowed down to:</p>
<code><![CDATA[#!/usr/bin/perl -w
use strict;
use XML::Twig;
local $SIG{__DIE__} = sub {
my $msg = shift;
print STDERR "dying! $msg\n"; exit 1;
};
new XML::Twig()->parse('<a />');]]></code>
</question>
<answer>This is a bug in XML::Parser. Upgrading to XML::Parser 2.34 or above solves the problem.
See the <a href="http://rt.cpan.org/Ticket/Display.html?id=4501">bug report on RT</a>.
</answer>
</q>
<q id="20">
<question>It looks like I can only print a twig (or an element) to STDIN, how do I
redirect the output to a file?</question>
<answer><p>You can pass a filehandle to <tt>print</tt>:</p>
<pre><tt> open( FH, ">output.xml") or die "cannot open output.xml: $!";
$twig->print( \*FH);</tt></pre>
</answer>
</q>
<q id="13"><question>For logging purposes I would like XML::Twig to report line/column number in the
original file</question>
<answer><p>Use <tt>start_tag_handlers</tt> to grab the line and column number through the parser object and
store them in private attributes (attributes whose name starts with a # are not output by XML::Twig):</p>
<code>#!/usr/bin/perl -w
use strict;
use XML::Twig;
my $t=XML::Twig->new( start_tag_handlers =>
{ # called when the start tag for elt is parsed
# use '#ELT' or _all_ to call the handler for all elements
elt => sub { my( $t, $elt)= @_;
$elt->set_att( '#line' => $t->current_line);
},
},
twig_handlers =>
{ # called when elt is completely parsed
elt => sub { my( $t, $elt)= @_;
print "error in elt starting line ",
$elt->att( '#line'), "\n"
if( $elt->has_child( 'subelt[@error]'));
},
},
);
$t->parsefile( "test_track_line_number.xml");
</code>
<p>will parse <tt>test_track_line_number.xml</tt> that looks like:</p>
<markup><![CDATA[<doc>
<elt>
<subelt>text 1</subelt>
<subelt>text 2</subelt>
<subelt>text 3</subelt>
</elt>
<elt>
<subelt>text 1</subelt>
<subelt error="yes">text 2</subelt>
<subelt>text 3</subelt>
</elt>
</doc>]]></markup>
<p>and will output: <tt>error in elt starting line 7</tt></p></answer>
</q>
<q id="14"><question>How do I include bits of (possibly not well-formed) HTML in an XML document and
use them to generate HTML?</question>
<answer><p>You can wrap the HTML in a CDATA section, which will prevent the parser to
look into the data. Then use a twig_handler on CDATA to process those sections.
Use the <tt>set_asis</tt> method to get those sections to be output without
being "XML escaped" (XML::Twig 3.05 and above)</p>
<code><![CDATA[
#!/usr/bin/perl -w
use strict;
use XML::Twig;
my $t= XML::Twig->new( twig_handlers => { '#CDATA' => sub { $_->set_asis; } });
$t->parse( \*DATA);
$t->print;
__DATA__
<doc>
<elt>text</elt>
<!-- embedded HTML, note the un-closed <br> tag -->
<ehtml><![CDATA[hello<br>world]]]]><![CDATA[></ehtml>
</doc>]]>
</code>
<p>will output (comment stripped for conciseness):</p>
<markup><![CDATA[<doc><elt>text</elt><ehtml>hello<br>world</ehtml></doc>]]></markup>
<p>Note that the CDATA section will not protect you from encoding problems, so if the included text is likely to
be in a different encoding than the main document you will have to do some encoding conversion before including it.</p></answer>
</q>
<q id="15">
<question><p>In which order are handlers called?</p>
<p>I have this simple Perl script that parse an XML document. The XML document use the following DTD:</p>
<markup><![CDATA[<!ELEMENT doc (title, elt+)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT elt (#PCDATA|subelt)+>
<!ELEMENT subelt (#PCDATA)]]></markup>
<p>I've noticed the following: although the element 'doc' is the root,
XML::Twig calls its handle last. All the elements 'title' and 'elt'
are processed in correct sequence. Why? The element 'doc' handler should be
called the first and not the last.</p>
<p>Is the element's handler called on the opening tag OR on the closing tag?</p>
</question>
<answer><p>Element handlers are called on the closing tag, as it is the only time
when the entire element has been parsed. The handler is
called as soon as the element has been completely parsed, which is when
its end tag has been parsed.</p>
<p>This indeed leads to handlers for the inner elements to be called before
the ones to the outer elements: here the handler on 'doc' will be
called after the handlers on 'title' and 'elt'.</p>
<p>This example will show you in which order the handlers are called:</p>
<code><![CDATA[#!/usr/bin/perl -w -l
use strict;
use XML::Twig;
my $t= XML::Twig->new( twig_handlers => { '_all_' => sub { print "handler for ", $_->att( 'id'); } },
error_context => 1,
);
$t->parse( \*DATA);
__DATA__
<doc id="doc">
<title id="title">title</title>
<elt id="elt_1">
<subelt id="subelt_1">subelt</subelt>
<subelt id="subelt_2">subelt</subelt>
</elt>
<elt id="elt_2">element 2</elt>
</doc>]]></code>
</answer>
</q>
<q id="17">
<question>Any neat trick to increase the performance of XML::Twig?</question>
<answer><p>Tom Anderson from tomacorp released an interesting article:
<a href="http://tomacorp.com/perl/xml/saxvstwig.html">Performance Comparison
Between SAX XML::Filter::Dispatcher and XML::Twig</a>. He notes:</p>
<blockquote><i>I learned an interesting performance optimization when writing
the anonymous subs for XML::Twig. These subs should not uselessly return
a long string. Processing this string can increase processing time by 50%
in this example. This is why the start_tag_handlers return the value 1</i>
</blockquote>
<p>Using this trick lead to a 4x speedup on my first attempt at speeding up Tom's example!</p>
<p>Thanks Tom!</p>
</answer>
</q>
<q id="19">
<question>I need to process XML documents. The problem is that they are several of them, so the
parser dies after the first one, with a message telling me that there is junk after the
end of the document. Is there any way I could trick the parser into believing they are
all part of a single document?</question>
<answer><p>You can open the input file as a pipe, first <tt>echo</tt>-ing an open tag, then getting
the input from wherever you get it, then <tt>echo</tt>-ing a close tag:</p>
<code><![CDATA[#!/usr/bin/perl -w
use strict;
use XML::Twig;
# here we have a very simple generator, but it could be any process that
# generates a stream of XML documents
my $xml_generator= q{echo '<doc>doc1</doc><doc>doc2</doc>'};
my $wrap= 'docs';
# this is where it all happens:
# the pipe at the end of the "file name" means that the name is a
# shell command, that will be executed then piped to the filehandle
open( IN, qq{echo '<$wrap>'; $xml_generator; echo '</$wrap>' |})
or die "error opening xml_generator: $!";
my $i=1;
my $t= XML::Twig->new( twig_handlers => {
doc => sub { print "document $i: ", $_->sprint, "\n";
$_[0]->purge; # to get he memory back
$i++;
}
},
);
$t->parse( \*IN);
close IN or die "error during the execution of xml_generator: $!";
]]></code>
</answer>
</q>
<q id="20">
<question>How to stop processing the document when a certain condition is met?</question>
<answer>
<p>There are 2 ways to do this:</p>
<ul><li>use <a href="http://xmltwig.org/xmltwig/twig_dev.html#METHODS_XML_Twig_finish"><tt>$twig->finish</tt></a>,
which will still parse (quickly) the file but without doing any processing on it.</li>
<li>wrap the <tt>$twig->parse</tt> in an eval, and <tt>die</tt> when you find the element you are interested in:
<code><![CDATA[#!/usr/bin/perl -w
use strict;
use XML::Twig;
my $t= XML::Twig->new( twig_handlers =>{ e => sub { print $_->id, "\n"; die 0; }, });
eval { $t->parse( q{<doc>toto<e id="tata"/>tata<e id="titi"/></doc>});};
print "done\n";'
]]></code></li></ul>
<p><b>update</b>: is now a third method: <a href="http://xmltwig.org/xmltwig/twig_dev.html#METHODS_XML_Twig_finish"><tt>$twig->finish_now</tt></a>
method is, as you might have guessed, a little more imperative than <tt>finish</tt>: while <tt>finish</tt> still finishes to parse the XML, and
dies if it isn't well-formed, <tt>finish_now</tt> just aborts the parsing and returns right away.</p>
</answer>
</q>
<q id="21">
<question><p>When I re-use a twig to parse an other document within a handler, I get a mysterious
<tt>calling depth after parsing is finished...</tt> error. What does it mean?</p>
<p>My code:</p>
<code><![CDATA[ my $t=XML::Twig->new( twig_handlers => { include => \&include })
->parsefile( "main_file.xml");
sub include
{ my( $t, $include);
$t->parsefile( $include->att( 'src');
};]]></code>
</question>
<answer><p>Indeed you cannot re-use the twig object to parse an other document. Contrary to most other modules (XML::Parser, XML::LibXML...), the twig is both the parser _and_ the parsed document. You can re-use the object if you parse several documents sequentially, but you cannot re-use it within a parse. So in your case you have to create a new XML::Twig object.</p>
<p>The reason for this is simple: incompetence. Mine. I wasn't very familiar with OO when I started writing the module, back in 1998, and I completely missed the object factory construct. Sorry.</p>
<p>Note that in version 3.22 and up the error message that is hopefully more explicit:
<tt>cannot reuse a twig that is already parsing</tt>.</p>
</answer>
</q>
<q id="22">
<question>I want to output the XML with the same format (indentation and line returns) as the
input file. I have tried <tt>pretty_print</tt> but I cannot get what I want.</question>
<answer>
<p>You can get the same formating as in the original file by using the <tt>keep_spaces => 1</tt> option when you create the twig. Note that this will create <tt>#PCDATA</tt> (text) elements that contain the whitespaces in your tree.</p>
</answer>
</q>
<q id="23">
<question>What does the error message <tt>*** glibc detected *** double free or corruption (!prev):</tt> mean,
and how do I get rid of it?</question>
<answer><p>You are using the UTF8 perlIO layer on your input stream, usually because the environment
variable <tt>PERL_UNICODE</tt> or the <tt>-C</tt> option include <tt>D</tt>. This causes
problems when reading from a pipe, due to a flaw in IO::Handle, used in XML::Parser in this case.</p>
<p>The workaround is to remove the <tt>D</tt> option, by setting <tt>PERL_UNICODE</tt> or using <tt>-C</tt>
with a value that does not include <tt>-d</tt>.</p>
<p>More info at <a href="http://rt.cpan.org/Ticket/Display.html?id=17500">http://rt.cpan.org/Ticket/Display.html?id=17500</a>.</p>
</answer>
</q>
<q id="24">
<question>I want to pass additional arguments to XML::Twig handlers, not just the twig and the element, and I'd rather not use
global variables. Can I do this?</question>
<answer><p>Sure, use a closure:</p>
<code><![CDATA[
my @additional_args= more_args();
my $t=XML::Twig->new( twig_handlers => { foo => sub { bar( @_, @additional_args) } });
sub bar
{ my( $t, $foo, @more_args)= @_;
...
};]]></code>
<p>A good explanation of what closures are can be found in <a href="http://www.perl.com/pub/a/2002/05/29/closure.html">Achieving
Closure</a>.</p>
</answer>
</q>
<copyright>Copyright (c)2000-2008 Michel Rodriguez. All rights reserved. Permission is hereby granted to freely distribute this document provided that all credits and copyright notices are retained.</copyright>
</faq>