shameless pule for help
Jul. 21st, 2006 05:05 pmupdate: fixed with help from
plautus (and an interesting alternate explanation from
tylerpistol)
I'm writing some PHP to take XML from a file. The file's in ISO-8859-1, and contains French accents and typesetter's quotes in its character data and some of its attributes. I've taken care to make the parser adapt itself to the character set and demanded that its target character set is also ISO-8859-1:
$xml_parser = xml_parser_create("");
xml_parser_set_option($xml_parser,XML_OPTION_TARGET_ENCODING,"ISO-8859-1");
xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE, 1);
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING, 0);
...
I set up a tag handler to turn an "in_prose" variable on and off (based, not surprisingly, on whether the parser is inside a "prose" element), when it's on I want to suck it into an array element using my cdataHandler function:
...
function cdataHandler($parser,$data){
if ($_SESSION["in_prose"]){
$_SESSION["evenements"][$_SESSION["current_event"]]["prose"]=$data;
}
}
Alas, it tends to chop off the beginning of the text. It's not predictably before any given character, or any predictable length, as far as I can tell. It's just dismembered when I echo it back. Any idea what might be causing this? Any safety measures I can take?
no subject
Date: 2006-07-22 02:00 am (UTC)When I do the next project... how much of a pain is it to set up DOM to try out?
no subject
Date: 2006-07-22 04:13 pm (UTC)you got it backwards
if you want to quickly get to a specific element, DOM works better.
2006-07-23
something
foobar
2006-07-22
something else
barfoo
// Find most recent piece of data
$tree = domxml_open_file('data.xml');
$root = $dom->document_element(); // everything in
$databits = $root->get_elements_by_tagname("data"); // collect all the data tags
$children = $databits[0]->child_nodes(); // collect the contents of the first element in $databits array.
echo $children[0]->get_content(); // the date
To me, thats better. It requires you to know the structure of the XML beforehand, but after all, how do you parse something you know nothing about?
In terms of installing, depends on your OS
If you're using FreeBSD then its as simple as installing a port.
/usr/ports/textproc/php4-domxml.
If you're running PHP5, then DOM is part of the core. No installation required. And PHP5's DOM is much nicer too, with support for xpath and various other goodies.
http://www.php.net/manual/en/ref.domxml.php
http://www.php.net/manual/en/ref.dom.php