A Path Less Taken

Breaking with convention in a very conventional fashion. Powered by WordPress

"What would you attempt to do if you knew you could not fail?"
Dr. Robert Schuller

Tuesday, February 24, 2009

Category: PHP Development Author: JJ 0 Comments

Yes, I know, the title is kind of funny. How much is there to demystify in a solution entitled SimpleXML. As it turns out for me, quite a bit. In working with PHP I had used it on several occasions in an experimental mode, but it wasn’t until I decided to dig under the hood a bit that I discovered just how simple SimpleXML really is. In the interest of helping others (and the future forgetful me) I share my insights below.

SimpleXML is an XML event based parser built into PHP 5. SimpleXMLElement is the base class that is used to deliver the solution. It implements Traversable which means that it can be iterated across using a foreach construct. One of the first things that you discover about SimpleXML is that a lot of the functions that you care about return SimpleXMLElement objects rather than scalars or arrays. This can be confusing because SimpleXMLElement will behave like an array since it is Traversable. Let’s take a look at how we can use SimpleXMLElement to help with our XML parsing needs.

Getting Started

The SimpleXMLElement constructor will accept an XML string or a file name. If you pass a string then it is the only parameter you should pass. If you pass a file name then you need to set the second and third parameters. The second parameter is an options list that I did not use in my tests. The third parameter is set to TRUE (the default is FALSE) to indicate that you are passing a path or URL instead of a string in the first parameter. There are 2 more parameters that can be passed to the constructor in position 4 and 5, but as with the second one I did not experiment with these. Below is an example constructor call for a SimpleXMLElement.

$strXml = "<tree><top height="28" unit="feet">Cat</top></tree>" ;
$element = new SimpleXMLElement ($strXml) ;

Using SimpleXMLElement

You can obtain most of the information you need from SimpleXMLElement using just a few method calls. I outline the ones that I tested below.

SimpleXMLElement::getName returns the name of the current XML element as a string. You should remember that if you have a collection of XML elements that have the same name and that are peers in the XML tree then this method is of little use in distinguishing between the elements. You will need to look at attributes or content to determine which element you are after. It’s use is demonstrated below.

$name = $element->getName () ;
print $name ;  // Would print tree from the example above

SimpleXMLElement::attributes, according to the documentation “… provides the attributes and values defined within an xml tag.” In reality what it returns is a SimpleXMLElement object. You can then iterate across the object using a foreach construct to get the name and value of each attribute. You can access a specific attribute by name using array dereferencing syntax. You can use the array count method to determine how many attributes the XML element has. It will return 0 if there are no attributes. It is interesting to note that if you attempt to use the array count method on the original SimpleXMLElement it returns the number of child elements rather than the number of attributes. Below is a simple example.

$attributes = $element->attributes () ;
/*
     note that for the <tree> element this loop will run 0 times
     for the <top> element it will run 2 times
*/
foreach ($attributes as $key => $value) {
     print "Attribute: " . $key . " Value: " . $value ;
}

SimpleXMLElement::children, according to the documentation “… finds the children of the element of which it is a member.” Once again the result is a SimpleXMLElement object. You can iterate across this object and use each value as a separate SimpleXMLElement object with it’s own attributes, values and children. You can use the array count method to determine how many child elements each returned child element has. It will return 0 if there are no children for an element. Below is a simple example.

foreach ($element->children () as $child) {
     // Process the child element as appropriate
}

Now that we have accessed the name, attributes and children of the element there is only one more item needed to get a complete picture of the XML string or document we are parsing. The internal content of the element. I’m not talking about child elements. I’m talking about the content that exists between the opening and closing XML tags that is not encased in any other XML markup. This content can be obtained by casting the SimpleXMLElement object in question as a string to obtain the content as a scalar result. The cast ignores any child elements. The only problem is that ALL the content is returned. If you are using carriage return / line feeds or tabs to make your XML document more human readable then those will be part of the content of the XML tags that contain them. The idea is that you should know your data well and only attempt this with XML elements that you expect to contain just content. Below is an example.

$tagContent = (string) $element ;
/*
     from the example above this would return an empty string for <tree> and the string "Cat" for <top>.
*/

That’s all you need to know to successfully parse XML using SimpleXML. In my experiments it became apparent that I was better off leaving the XML parsing to SimpleXML and building classes to manage the specific data I was trying to capture from the XML. This approach is sound because the SimpleXML client is written in C and is very fast. By letting it do what it does well I was free to focus on the needs of my application. If you want to view the complete PHP documentation for SimpleXML you can find it here. Good luck in your own projects and let me know if you see any glaring issues or omissions above.

Peace

No comments found. Please enter a comment if you have a question or contribution.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">