metawidget: A platypus looking pensive. (Default)
[personal profile] metawidget

update: fixed with help from [livejournal.com profile] plautus (and an interesting alternate explanation from [livejournal.com profile] tylerpistol)


I'm writing some PHP to take XML from a file. The file's in ISO-8859-1, and contains French accents and typesetter's quotes in its character data and some of its attributes. I've taken care to make the parser adapt itself to the character set and demanded that its target character set is also ISO-8859-1:

$xml_parser = xml_parser_create("");
xml_parser_set_option($xml_parser,XML_OPTION_TARGET_ENCODING,"ISO-8859-1");
xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE, 1);
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING, 0);

...

I set up a tag handler to turn an "in_prose" variable on and off (based, not surprisingly, on whether the parser is inside a "prose" element), when it's on I want to suck it into an array element using my cdataHandler function:

...

function cdataHandler($parser,$data){
	if ($_SESSION["in_prose"]){
		$_SESSION["evenements"][$_SESSION["current_event"]]["prose"]=$data;
	}
}

Alas, it tends to chop off the beginning of the text. It's not predictably before any given character, or any predictable length, as far as I can tell. It's just dismembered when I echo it back. Any idea what might be causing this? Any safety measures I can take?

Date: 2006-07-22 04:13 pm (UTC)
From: [identity profile] v0idnull.livejournal.com
eh
you got it backwards

if you want to quickly get to a specific element, DOM works better.



2006-07-23
something
foobar


2006-07-22
something else
barfoo



// Find most recent piece of data
$tree = domxml_open_file('data.xml');
$root = $dom->document_element(); // everything in
$databits = $root->get_elements_by_tagname("data"); // collect all the data tags
$children = $databits[0]->child_nodes(); // collect the contents of the first element in $databits array.
echo $children[0]->get_content(); // the date


To me, thats better. It requires you to know the structure of the XML beforehand, but after all, how do you parse something you know nothing about?

In terms of installing, depends on your OS
If you're using FreeBSD then its as simple as installing a port.
/usr/ports/textproc/php4-domxml.

If you're running PHP5, then DOM is part of the core. No installation required. And PHP5's DOM is much nicer too, with support for xpath and various other goodies.


http://www.php.net/manual/en/ref.domxml.php

http://www.php.net/manual/en/ref.dom.php

Profile

metawidget: A platypus looking pensive. (Default)
metawidget
Page generated Jan. 24th, 2026 06:08 am

June 2025

S M T W T F S
123 4567
89 1011121314
15 161718192021
22 232425 262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Powered by Dreamwidth Studios