shameless pule for help
Jul. 21st, 2006 05:05 pmupdate: fixed with help from
plautus (and an interesting alternate explanation from
tylerpistol)
I'm writing some PHP to take XML from a file. The file's in ISO-8859-1, and contains French accents and typesetter's quotes in its character data and some of its attributes. I've taken care to make the parser adapt itself to the character set and demanded that its target character set is also ISO-8859-1:
$xml_parser = xml_parser_create("");
xml_parser_set_option($xml_parser,XML_OPTION_TARGET_ENCODING,"ISO-8859-1");
xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE, 1);
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING, 0);
...
I set up a tag handler to turn an "in_prose" variable on and off (based, not surprisingly, on whether the parser is inside a "prose" element), when it's on I want to suck it into an array element using my cdataHandler function:
...
function cdataHandler($parser,$data){
if ($_SESSION["in_prose"]){
$_SESSION["evenements"][$_SESSION["current_event"]]["prose"]=$data;
}
}
Alas, it tends to chop off the beginning of the text. It's not predictably before any given character, or any predictable length, as far as I can tell. It's just dismembered when I echo it back. Any idea what might be causing this? Any safety measures I can take?
no subject
Date: 2006-07-21 10:21 pm (UTC)"Immigants! I knew it was them! Even when it was the bears, I knew it was them."
XML parsing
Date: 2006-07-21 10:37 pm (UTC)If it's anything like SAX then you need to do something like
$data_string .= $data; every cdataHandler call and wait until the CLOSE of the tag to collect the data (only then are you sure you are finished collecting all the data). Otherwise, you'll get unpredictable results such as the one you are describing.
Hey I could be wrong, but check out this page:
http://www.zend.com/zend/art/parsing.php
I've done XML parsing in Java and Perl and they all seem to act pretty much the same.
hehe
Date: 2006-07-21 10:45 pm (UTC)function cdataHandler($parser,$data){ if ($_SESSION["in_prose"]){ $_SESSION["evenements"][$_SESSION["current_event"]]["prose"] .= $data; } }Re: hehe
Date: 2006-07-21 11:13 pm (UTC)sweet :)
no subject
Date: 2006-07-22 01:18 am (UTC)expat is an events based parser. It'll parse based on tag and attribute values. Useful if you need to skip over large amounts of data and pick out certain tags and attributes. And even then as I said, it's only real advantage comes when you're parsing very large XML files.
Otherwise, I would stick with DOM. Completely OOP, much more standardized, granted a bit more difficult to install, but so much faster to code, and the code is far more easier to read and manage.
no subject
Date: 2006-07-22 01:55 am (UTC)http://bugs.php.net/bug.php?id=11643
I'm doing session management, but it's working just fine, really.
no subject
Date: 2006-07-22 02:00 am (UTC)When I do the next project... how much of a pain is it to set up DOM to try out?
no subject
Date: 2006-07-22 02:02 am (UTC)no subject
Date: 2006-07-22 04:13 pm (UTC)you got it backwards
if you want to quickly get to a specific element, DOM works better.
2006-07-23
something
foobar
2006-07-22
something else
barfoo
// Find most recent piece of data
$tree = domxml_open_file('data.xml');
$root = $dom->document_element(); // everything in
$databits = $root->get_elements_by_tagname("data"); // collect all the data tags
$children = $databits[0]->child_nodes(); // collect the contents of the first element in $databits array.
echo $children[0]->get_content(); // the date
To me, thats better. It requires you to know the structure of the XML beforehand, but after all, how do you parse something you know nothing about?
In terms of installing, depends on your OS
If you're using FreeBSD then its as simple as installing a port.
/usr/ports/textproc/php4-domxml.
If you're running PHP5, then DOM is part of the core. No installation required. And PHP5's DOM is much nicer too, with support for xpath and various other goodies.
http://www.php.net/manual/en/ref.domxml.php
http://www.php.net/manual/en/ref.dom.php