xml_parse_into_struct

(PHP 3 >= 3.0.8, PHP 4, PHP 5)

xml_parse_into_struct -- 将 XML 数据解析到数组中

说明

int xml_parse_into_struct ( resource parser, string data, array &values [, array &index] )

该函数将 XML 文件解析到两个对应的数组中,index 参数含有指向 values 数组中对应值的指针。最后两个数组参数可由指针传递给函数。

注: xml_parse_into_struct() 失败返回 0,成功返回 1。这和 FALSETRUE 不同,使用例如 === 的运算符时要注意。

以下范例显示了由该函数生成的数组的内部结构。我们简单地将一个 note 嵌入到一个 para 标记中,解析后我们可以打印出生成的数组的结构:

例子 1. xml_parse_into_struct() 示例

<?php
$simple
= "<para><note>simple note</note></para>";
$p = xml_parser_create();
xml_parse_into_struct($p, $simple, $vals, $index);
xml_parser_free($p);
echo
"Index array\n";
print_r($index);
echo
"\nVals array\n";
print_r($vals);
?>

运行以上代码,我们得到的输出将是:

Index array
Array
(
    [PARA] => Array
        (
            [0] => 0
            [1] => 2
        )

    [NOTE] => Array
        (
            [0] => 1
        )

)

Vals array
Array
(
    [0] => Array
        (
            [tag] => PARA
            [type] => open
            [level] => 1
        )

    [1] => Array
        (
            [tag] => NOTE
            [type] => complete
            [level] => 2
            [value] => simple note
        )

    [2] => Array
        (
            [tag] => PARA
            [type] => close
            [level] => 1
        )

)

如果您的 XML 文档很复杂,基于该文档的事件处理(Event-driven)解析(基于 expat 扩展库)也会对应的变得复杂。该函数生成的并非 DOM 风格的对象,而是横向的树状结构。因此,我们能够方便的建立表达 XML 文件数据的对象。我们假设以下 XML 文件表示一个关于氨基酸信息的小型数据库:

例子 2. moldb.xml - 分子信息的小型数据库

<?xml version="1.0"?>
<moldb>

    <molecule>
        <name>Alanine</name>
        <symbol>ala</symbol>
        <code>A</code>
        <type>hydrophobic</type>
    </molecule>

    <molecule>
        <name>Lysine</name>
        <symbol>lys</symbol>
        <code>K</code>
        <type>charged</type>
    </molecule>

</moldb>
以下是解析该文档并生成相应对象的代码:

例子 3. parsemoldb.php - 将 moldb.xml 解析到分子(molecular)对象的数组中

<?php

class AminoAcid {
    var
$name;  // aa 姓名
    
var $symbol;    // 三字母符号
    
var $code;  // 单字母代码
    
var $type;  // hydrophobic, charged 或 neutral

    
function AminoAcid ($aa)
    {
        foreach (
$aa as $k=>$v)
            
$this->$k = $aa[$k];
    }
}

function
readDatabase($filename)
{
    
// 读取 aminoacids 的 XML 数据
    
$data = implode("",file($filename));
    
$parser = xml_parser_create();
    
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
    
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
    
xml_parse_into_struct($parser, $data, $values, $tags);
    
xml_parser_free($parser);

    
// 遍历 XML 结构
    
foreach ($tags as $key=>$val) {
        if (
$key == "molecule") {
            
$molranges = $val;
            
// each contiguous pair of array entries are the
            // lower and upper range for each molecule definition
            
for ($i=0; $i < count($molranges); $i+=2) {
                
$offset = $molranges[$i] + 1;
                
$len = $molranges[$i + 1] - $offset;
                
$tdb[] = parseMol(array_slice($values, $offset, $len));
            }
        } else {
            continue;
        }
    }
    return
$tdb;
}

function
parseMol($mvalues)
{
    for (
$i=0; $i < count($mvalues); $i++) {
        
$mol[$mvalues[$i]["tag"]] = $mvalues[$i]["value"];
    }
    return new
AminoAcid($mol);
}

$db = readDatabase("moldb.xml");
echo
"** Database of AminoAcid objects:\n";
print_r($db);

?>
在执行完 parsemoldb.php 后,变量 $db 将包含有一个由 AminoAcid 对象组成的数组,该脚本的输出如下:

** Database of AminoAcid objects:
Array
(
    [0] => aminoacid Object
        (
            [name] => Alanine
            [symbol] => ala
            [code] => A
            [type] => hydrophobic
        )

    [1] => aminoacid Object
        (
            [name] => Lysine
            [symbol] => lys
            [code] => K
            [type] => charged
        )

)


add a note add a note User Contributed Notes
A3
04-Nov-2006 09:28
XML -> Array
<?
   $data
= '<root><a><b x="s" a="2">asdf</b><c></c></a></root>';
  
  
$p = xml_parser_create();
  
xml_parse_into_struct($p, $data, $vals);
  
xml_parser_free($p);
  
  
$key = $output = array();
   foreach (
$vals as $id=>$item) {
     if (
$item["type"]=="open" || $item["level"]>count($key)) {// && count($key)<=$item["level"])
        
array_push($key, $id);
        
$temp = array("tag"=>$item["tag"], "value"=>"", "attributes"=>array());
         eval(
"\$output[".implode("][", $key)."] = \$temp;");
     }
     if (
$item["type"]=="close" || $item["level"]<count($key))// && $item["level"]>=count($key))
        
array_pop($key);
     if (isset(
$item["attributes"]))
         eval(
"\$output[".implode("][", $key)."]['attributes'] = array_merge(\$output[".implode("][", $key)."]['attributes'], \$item['attributes']);");
     if (isset(
$item["value"]))
         eval(
"\$output[".implode("][", $key)."]['value'] .= \$item['value'];");
   }
?>
Elad Elrom
13-Sep-2006 05:14
This is a quick fix for parsing XML from remote URL, some of the example above will work when trying to parse on your local server without "http://" but not when trying to parse from remote server using "http://www.URL"...

<?
$file
="http://www.URL.com/file.XML";

$xml_parser = xml_parser_create();

$handle = fopen($file, "rb");
$contents = '';
while (!
feof($handle)) {
 
$data .= fread($handle, 8192);
}
fclose($handle);

xml_parse_into_struct($xml_parser, $data, $vals, $index);
xml_parser_free($xml_parser);

$params = array();
$level = array();
foreach (
$vals as $xml_elem) {
  if (
$xml_elem['type'] == 'open') {
   if (
array_key_exists('attributes',$xml_elem)) {
     list(
$level[$xml_elem['level']],$extra) = array_values($xml_elem['attributes']);
   } else {
    
$level[$xml_elem['level']] = $xml_elem['tag'];
   }
  }
  if (
$xml_elem['type'] == 'complete') {
  
$start_level = 1;
  
$php_stmt = '$params';
   while(
$start_level < $xml_elem['level']) {
    
$php_stmt .= '[$level['.$start_level.']]';
    
$start_level++;
   }
  
$php_stmt .= '[$xml_elem[\'tag\']] = $xml_elem[\'value\'];';
   eval(
$php_stmt);
  }
}

echo
"<pre>";
print_r ($params);
echo
"</pre>";
?>
mad dot cat at mcmadcat dot com
06-Sep-2006 07:55
this my love function:
<?php
function mc_parse_xml($filename)
{
  
$xml = file_get_contents($filename);
  
$p = xml_parser_create();
  
xml_parse_into_struct($p, $xml, $values, $index);
  
xml_parser_free($p);
   for (
$i=0;$i<count($values);$i++) {
       if (isset(
$values[$i]['attributes'])) {
          
$parent = $values[$i]['tag'];
          
$keys = array_keys($values[$i]['attributes']);
           for (
$z=0;$z<count($keys);$z++)
           {
              
$content[$parent][$i][$keys[$z]] = $values[$i]['attributes'][$keys[$z]];
               if (isset(
$content[$parent][$i]['VALUE'])) $content[$parent][$i]['VALUE'] = $values[$i]['value'];
           }
       }
   }
   foreach (
$content as $key => $values) {
      
$content[$key] = array_values($content[$key]);
   }
   if (
is_array($content)) return $content;
   else return
false;
}
?>
webmaster at unitedscripters dot com
17-Jul-2006 08:29
Ps keep in mind that some Rss feeds include spurious tags as... html entities (see Google news Rss feeds: they include tables as &lt;table blah blah!).

If so, in my rssSnapper below add this:

<?php
$input
=preg_replace("/(<!\\[CDATA\\[)|(\\]\\]>)/", '', $input);
$input=html_entity_decode($input); //<-- added line
?>

You may play around with the code and make it perfect, testing it on various feeds. Not _all_ XML is worth of an XML parser and the sleepless nights it entails.
webmaster at unitedscripters dot com
17-Jul-2006 05:43
It may be not entirely immaterial to stress that when you are dealing with incoming XML files such as RSS feeds, and you are about to include several of them in some page of yours, resorting to the PHP XML oriented functions is neither _necessarily_ the best idea, nor it is _strictly_ indispensable.

I have in mind, here, also a note that time ago was on this documentation by some info at gramba dot tv:

QUOTE
I was working with the xml2array functions below and had big performance problems. I fired them on a 20MB XML file and had to quit since all approaches of parsing where just too slow (more than 20 Minute parsing etc..). The solution was parsing it manually with preg_match, which increased performance by more than 20 times (processing time about 1 minute).
UNQUOTE

Calling in a specific XML structure function, and arranging a whole class, when all you want from an incoming files may be the contents of a few tags, is not the only option you are left with, when you are at PHP.

Here is a simple function that parses a XML RSS feed using no XML oriented function: keeping this in mind may spare you the need to create extremely complex classes as the ones we see here when _all_ you may want is a few titles and descriptions from an RSS (if that's your goal, you don't need XML parsers):

<?php
function rssSnapper($input='', $limit=0, $feedChannel='Yahoo!News'){
$input=file_get_contents($input);
   if(!
$input){return '';};
$input=preg_replace("/[\\n\\r\\t]+/", '', $input);
$input=preg_replace("/(<!\\[CDATA\\[)|(\\]\\]>)/", '', $input);
preg_match_all("/<item>(.*?)<\\/item>/", $input, $items, PREG_SET_ORDER);
$limit=(int)$limit;
$limit=($limit && is_numeric($limit) && abs($limit)<sizeof($items))? sizeof($items)-abs($limit): 0;
while(
sizeof($items)>$limit){
  
$item=array_shift($items);
  
$item=$item[1];
  
preg_match_all("/<link>(.*?)<\\/link>/", $item, $link, PREG_SET_ORDER);
  
preg_match_all("/<title>(.*?)<\\/title>/", $item, $title, PREG_SET_ORDER);
  
preg_match_all("/<author>(.*?)<\\/author>/", $item, $author, PREG_SET_ORDER);
  
preg_match_all("/<pubDate>(.*?)<\\/pubDate>/", $item, $pubDate, PREG_SET_ORDER);
  
preg_match_all("/<description>(.*?)<\\/description>/", $item, $description, PREG_SET_ORDER);
       if(
sizeof($link)){ $link = strip_tags($link[0][1]); };
       if(
sizeof($title)){ $title = strtoupper( strip_tags($title[0][1]) ); };
       if(
sizeof($author)){ $author = strip_tags($author[0][1]); };
       if(
sizeof($pubDate)){ $pubDate = strip_tags($pubDate[0][1]); };
       if(
sizeof($description)){ $description = strip_tags($description[0][1]); };
   print <<<USAVIT
   
   <!-- ITEM STARTS -->
   <div class="news_bg_trick">
   <a href="$link" class="item" target="_blank">
   <span class="title">$title<span class="channel">$feedChannel</span></span>
   <span class="title_footer">
   by <span class="author">$author</span> -
   <span class="date">$pubDate</span>
   </span>
   <span class="description">$description</span>
   </a>
   </div>
   <!-- ITEM ENDS -->
  
USAVIT;
}
//out of loop
/*unitedscripters.com*/
}
?>

The printing phase assigns Css class names: the output is thus fully customizable by a mere style sheet.

The use of strip_tags is a reminder from Chris Shiflett: distrust incoming data, always, anyway.
I hope no typos slipped in in transcription. Arguably not perfect, but I hope a good alternative idea to spending three days on a full fledged XML parser just to grab... three tags from a RSS feed!
bye, ALberto
efredricksen at gmail dot com
24-May-2006 03:55
Perhaps the one true parser:? I modified xademax's fine code to tidy it up, codewise and style wise, rationalize some minor crazyness, and make names fit nomenclature from the XML spec. (There are no uses of eval, and shame on you people who do.)

<?php
class XmlElement {
  var
$name;
  var
$attributes;
  var
$content;
  var
$children;
};

function
xml_to_object($xml) {
 
$parser = xml_parser_create();
 
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
 
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
 
xml_parse_into_struct($parser, $xml, $tags);
 
xml_parser_free($parser);

 
$elements = array();  // the currently filling [child] XmlElement array
 
$stack = array();
  foreach (
$tags as $tag) {
  
$index = count($elements);
   if (
$tag['type'] == "complete" || $tag['type'] == "open") {
    
$elements[$index] = new XmlElement;
    
$elements[$index]->name = $tag['tag'];
    
$elements[$index]->attributes = $tag['attributes'];
    
$elements[$index]->content = $tag['value'];
     if (
$tag['type'] == "open") {  // push
      
$elements[$index]->children = array();
      
$stack[count($stack)] = &$elements;
      
$elements = &$elements[$index]->children;
     }
   }
   if (
$tag['type'] == "close") {  // pop
    
$elements = &$stack[count($stack) - 1];
     unset(
$stack[count($stack) - 1]);
   }
  }
  return
$elements[0];  // the single top-level element
}

// For example:
$xml = '
<parser>
   <name language="en-us">Fred Parser</name>
   <category>
       <name>Nomenclature</name>
       <note>Noteworthy</note>
   </category>
</parser>
'
;
print_r(xml_to_object($xml));
?>

will give:

xmlelement Object
(
   [name] => parser
   [attributes] =>
   [content] =>
   [children] => Array
       (
           [0] => xmlelement Object
               (
                   [name] => name
                   [attributes] => Array
                       (
                           [language] => en-us
                       )

                   [content] => Fred Parser
                   [children] =>
               )

           [1] => xmlelement Object
               (
                   [name] => category
                   [attributes] =>
                   [content] =>
                   [children] => Array
                       (
                           [0] => xmlelement Object
                               (
                                   [name] => name
                                   [attributes] =>
                                   [content] => Nomenclature
                                   [children] =>
                               )

                           [1] => xmlelement Object
                               (
                                   [name] => note
                                   [attributes] =>
                                   [content] => Noteworthy
                                   [children] =>
                               )

                       )

               )

       )

)
Stuart
24-May-2006 06:26
This is a great little function for a lot of XML work, but note that this function does not handle XML entities properly.

The basic XML entities &lt; &gt; &amp; &quot; are fine, just anything else will not work:

If the entity is defined in the XML header, the parser will drop it completely from the struct it creates.

If the entity is not defined in the XML then the parser will crash out with an undefined entity error.

You should be able to work around this limitation by using a preg_replace on your XML string before passing it to the parser.

(Further details in Bug #35271; this is just a warning to those thinking of using this function for parsing real XML documents not just trivial XML examples)
donna at coloma dot com
29-Apr-2006 03:15
I needed a very simple parser for a set of name-value pairs to be stored in a single database field. I started with the mold example, paired it down, picked up the "id" attributes. Perhaps it will be useful for someone else.

<?php
/* simple conversion for name-value fields */

$xmlInput = "<?xml version=\"1.0\"?>
<mcw_settings>
  <field id=\"imageAlign\">left</field>
  <field id=\"caption\">What a nice picture.</field>
</mcw_settings>"
;

$desiredResult = array (
  
'imageAlign' => "left",
  
'caption' => "What a nice picture."
);

function
parseFields ($data)
{
  
// read the XML database of fields
  
$parser = xml_parser_create();
  
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
  
xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
  
xml_parse_into_struct($parser, $data, $values, $tags);
  
xml_parser_free($parser);
  
$fields = null;

  
// loop through the structures
  
$fieldIndices = $tags['field'];
   for (
$i=0; $i < count($fieldIndices); $i++ ) {
      
$fieldInfo = $values[$fieldIndices[$i]];
      
$fields[$fieldInfo['attributes']['id']] = $fieldInfo['value'];
   }
   return
$fields;
}

$test = parseFields($xmlInput);
echo
"** Result:\n";
print_r($test);
?>
matt at australiangamer dot com
03-Apr-2006 10:58
I liked VampBoy's code as it gave me structure pretty much as I wanted it. Just two notes, though:

There is a bug in this code. subdivide() checks for pre-existing values, but not in the case of a "complete" element.

In the following (appallingly bad) XML

<Names>
<Name>Matt</Name>
<Name>Stacy</Name>
<Names>

Only Stacy is added to Names in the array, wiping Matt in the process.

To fix this replace the following code
<?php
elseif ($dat[level] === $level && $dat[type] === "complete"){
  
$newarray[$dat[tag]]=$dat[value];
}
?>
with
<?php
elseif ($dat[level] === $level && $dat[type] === "complete"){
     if (isset(
$newarray[$dat['tag']]) && is_array($newarray[$dat['tag']])){
        
$newarray[$dat['tag']][] = $dat['value'];
     } elseif (isset(
$newarray[$dat['tag']]) && !is_array($newarray[$dat['tag']])){
        
$newarray[$dat['tag']] = array($newarray[$dat['tag']], $dat['value']);
     } else {
        
$newarray[$dat['tag']]=$dat['value'];
     }
}
?>

Oh, also, note that the $dat[level], etc, WILL generate warnings, and should be more correctly written as $dat['level'].

The other thing I thought I should point out is that the array keys when created using xml_parse_into_struct will be an UPPERCASE version of your existing element names. If case is important, especially if you, like me, need mixedCase, do the following:

<?php
$xml_parser
= xml_parser_create();
xml_parser_set_option($xml_parser,XML_OPTION_CASE_FOLDING,0);
?>
30-Mar-2006 03:57
$simple = '<?xml version="1.0"?>
<moldb>
   <molecule>
       <name>Alanine</name>
       <symbol>ala</symbol>
       <code>A</code>
       <type>hydrophobic</type>
   </molecule>
   <molecule>
       <name>Lysine</name>
       <symbol>lys</symbol>
       <code>K</code>
       <type>charged</type>
   </molecule>
</moldb>';

$p = xml_parser_create();
xml_parse_into_struct($p, $simple, $vals, $index);
xml_parser_free($p);
echo "<pre>";
echo "Index array\n";
print_r($index);
echo "\nVals array\n";
print_r($vals);

foreach ($vals as $k => $v) {
   $i = 0;
   if (trim($v['value']) != '') {
       if ($arr[$i][$v['tag']] != '') {
           $i ++;
       }
       $arr[$i][$v['tag']] = $v['value'];
   }
}
xademax at gmail dot com
09-Jan-2006 04:15
This is just another simple xml parser :)

<?php

class Xml
{
   var
$tag;
   var
$value;
   var
$attributes;
   var
$next;
}

function
xml2array($xml_string)
{
  
$Parser = xml_parser_create();
  
xml_parser_set_option($Parser, XML_OPTION_CASE_FOLDING, 0);
  
xml_parser_set_option($Parser, XML_OPTION_SKIP_WHITE, 1);
  
xml_parse_into_struct($Parser, $xml_string, $Xml_Values);
  
xml_parser_free($Parser);
  
$XmlClass = array();
  
$LastObj = array();
  
$NowObj = &$XmlClass;

   foreach(
$Xml_Values as $Xml_Key => $Xml_Value)
   {
      
$Index = count($NowObj);
       if(
$Xml_Value["type"] == "complete")
       {
          
$NowObj[$Index] = new Xml;
          
$NowObj[$Index]->tag = $Xml_Value["tag"];
          
$NowObj[$Index]->value = $Xml_Value["value"];
          
$NowObj[$Index]->attributes = $Xml_Value["attributes"];
       }
       elseif(
$Xml_Value["type"] == "open")
       {
          
$NowObj[$Index] = new Xml;
          
$NowObj[$Index]->tag = $Xml_Value["tag"];
          
$NowObj[$Index]->value = $Xml_Value["value"];
          
$NowObj[$Index]->attributes = $Xml_Value["attributes"];
          
$NowObj[$Index]->next = array();
          
$LastObj[count($LastObj)] = &$NowObj;
          
$NowObj = &$NowObj[$Index]->next;
       }
       elseif(
$Xml_Value["type"] == "close")
       {
          
$NowObj = &$LastObj[count($LastObj) - 1];
           unset(
$LastObj[count($LastObj) - 1]);
       }
      
   }

   return
$XmlClass;
}

$String = "
<parser>
   <parseur_name>MyParser</parseur_name>
   <category>
       <name>Name 1</name>
       <note>A note 1</note>
   </category>
</parser>
"
;
$Xml = xml2array($String);

print_r($Xml);
?>

This exemple will show :
Array
(
   [0] => Xml Object
       (
           [tag] => parser
           [value] =>
           [attributes] =>
           [next] => Array
               (
                   [0] => Xml Object
                       (
                           [tag] => parseur_name
                           [value] => MyParser
                           [attributes] =>
                           [next] =>
                       )

                   [1] => Xml Object
                       (
                           [tag] => category
                           [value] =>
                           [attributes] =>
                           [next] => Array
                               (
                                   [0] => Xml Object
                                       (
                                           [tag] => name
                                           [value] => Name 1
                                           [attributes] =>
                                           [next] =>
                                       )

                                   [1] => Xml Object
                                       (
                                           [tag] => note
                                           [value] => A note 1
                                           [attributes] =>
                                           [next] =>
                                       )

                               )

                       )

               )

       )

)
VampBoy
15-Dec-2005 06:45
WHUPS! that was a broken test version.here is the real one:

class xml2array{
/* This class parses XML tags into a recursive, associative array with the tags as the associative array elements names.

if it encounters multiples of the same tag within a stream, it enumerates them as a sub array under the tag thus:

Array (
   [Lvl1tag] => Array (
       [0] => Array(
           [Lvl2tag] = "foo")
       [1]=> Array(
           [Lvl2tag] = "bar")
   )
)

It tries to detect when there is only one copy of a tag under another, and concatinate properly.
*/

   function readxmlfile($xmlfile){ // reads XML file in and returns it
     $xmlstream =fopen($xmlfile,r);
     $xmlraw=fread($xmlstream,1000000);
     fclose($xmlstream);
     return $xmlraw;
   }

   function parseXMLintoarray ($xmldata){ // starts the process and returns the final array
     $xmlparser = xml_parser_create();
     xml_parse_into_struct($xmlparser, $xmldata, $arraydat);
     xml_parser_free($xmlparser);
     $semicomplete = $this->subdivide($arraydat);
     $complete = $this->correctentries($semicomplete);
     return $complete;
   }
  
   function subdivide ($dataarray, $level = 1){
     foreach ($dataarray as $key => $dat){
       if ($dat[level] === $level && $dat[type] === "open"){
         $toplvltag = $dat[tag];
       } elseif ($dat[level] === $level && $dat[type] === "close" && $dat[tag]=== $toplvltag){
         $newarray[$toplvltag][] = $this->subdivide($temparray,($level +1));
        
        
         unset($temparray,$nextlvl);
       } elseif ($dat[level] === $level && $dat[type] === "complete"){
         $newarray[$dat[tag]]=$dat[value];
       } elseif ($dat[type] === "complete"||$dat[type] === "close"||$dat[type] === "open"){
         $temparray[]=$dat;
       }
     }
     return $newarray;
   }
function correctentries($dataarray){

if (is_array($dataarray)){
  $keys =  array_keys($dataarray);
  if (count($keys)== 1 && is_int($keys[0])){
   $tmp = $dataarray[0];
   unset($dataarray[0]);
       $dataarray = $tmp;
  }
  $keys2 = array_keys($dataarray);
  foreach($keys2 as $key){
   $tmp2 = $dataarray[$key];
   unset($dataarray[$key]);
   $dataarray[$key] = $this->correctentries($tmp2);
   unset($tmp2);
  }
  }
return $dataarray;
}
}
p dot gasiorowski at axent dot pl
13-Dec-2005 10:31
Something similar to kieran's _xml2array, however much more faster.
------
class ObjectFromXML
{
   var $parser;
   var $iter = 0;
   var $path = array();
   var $xml = array();

   function ObjectFromXML($XML)
   {
       $this->parser = xml_parser_create();
      
       xml_set_object($this->parser, &$this);
      
       xml_parser_set_option($this->parser, XML_OPTION_SKIP_WHITE, 1);
       xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, 0);

       xml_set_element_handler($this->parser, "hanleTagStart", "hanleTagEnd");
       xml_set_character_data_handler($this->parser, "hanleTagCData");

       xml_parse($this->parser, $XML);
       xml_parser_free($this->parser);
      
       $this->xml = $this->xml['_children'][0];
   }

   function getEvalPath()
   {
       return '$this->xml[' . "'" . implode("']['", $this->path) . "'" . ']';
   }

   function hanleTagStart($parser, $tag, $attributes)
   {
       array_push($this->path, '_children');
       array_push($this->path, ($this->iter++));

       $e = $this->getEvalPath();
       eval ($e . "['_name'] = \$tag;");
       if ($attributes !== array())
       {
           eval ($e . "['_attributes'] = \$attributes;");
       }
   }

   function hanleTagCData($parser, $cdata)
   {
       $e = $this->getEvalPath();
       eval ($e . "['_value'] = \$cdata;");
   }

   function hanleTagEnd($parser, $tag)
   {
       array_pop($this->path);
       array_pop($this->path);
   }
}
mbirth at webwriters dot de
08-Nov-2005 07:48
Searching for a nice and working way to get a RSS feed into an array-structure, I found the solution posted by kieran but disliked those several eval()s. So I wrote my own using references/pointers.

<?php

class RSSParser {
 
  var
$struct = array();  // holds final structure
 
var $curptr// current branch on $struct
 
var $parents = array();  // parent branches of current branch
 
 
function RSSParser($url) {
  
$this->curptr =& $this->struct// set ref to base
  
$xmlparser = xml_parser_create();
  
xml_set_object($xmlparser, $this);
  
xml_set_element_handler($xmlparser, 'tag_open', 'tag_close');
  
xml_set_character_data_handler($xmlparser, 'cdata');
  
$fp = fopen($url, 'r');

   while (
$data = fread($fp, 4096))
    
xml_parse($xmlparser, $data, feof($fp))
     || die(
sprintf("XML error: %s at line %d",
        
xml_error_string(xml_get_error_code($xmlparser)),
        
xml_get_current_line_number($xmlparser)));

  
fclose($fp);
  
xml_parser_free($xmlparser);
  }
 
  function
tag_open($parser, $tag, $attr) {
  
$i = count($this->curptr['children']);
  
$j = count($this->parents);
  
$this->curptr['children'][$i]=array();  // add new child element
  
$this->parents[$j] =& $this->curptr// store current position as parent
  
$this->curptr =& $this->curptr['children'][$i];  // submerge to newly created child element
  
$this->curptr['name'] = $tag;
   if (
count($attr)>0) $this->curptr['attr'] = $attr;
  }
 
  function
tag_close($parser, $tag) {
  
$i = count($this->parents);
   if (
$i>0) $this->curptr =& $this->parents[$i-1];  // return to parent element
  
unset($this->parents[$i-1]);  // clear from list of parents
 
}
 
  function
cdata($parser, $data) {
  
$data = trim($data);
   if (!empty(
$data)) {
    
$this->curptr['value'] .= $data;
   }
  }
 
}

$myparser = new RSSParser('getitems.xml');
$anotherparser = new RSSParser('http://johndoe:secret@myfeeds.com/getfeed.xml');

print_r($myparser->struct);
print_r($anotherparser->struct);

?>
kieran at kieran dot ca
21-Oct-2005 12:20
<?
/*
|
| _xml2array - another abstraction layer on xml_parse_into_struct
|              that returns a nice nested array.
|
|      @param: $xml is a string containing a full xml document
|
|    returns: a nested php array that looks like this:
|                 
|              array
|              (
|                  [_name] => the name of the tag
|                  [_attributes] => an array of 'attribute'=>'value' combos
|                  [_value] => the text contents of the node
|                  [_children] => an array of these arrays, one for each node.
|              )
|
|      notes: thanks to 'jeffg at activestate dot com' who inspired
|              me to essentially re-write his example code from php.net
|
|          me: Kieran Huggins < kieran[at]kieran[dot]ca >
|
*/
function _xml2array($xml){
   global
$keys;
   global
$level;
   if(!
is_array($xml)){ // init on first run
      
$raw_xml = $xml;
      
$p = xml_parser_create();
      
xml_parser_set_option($p, XML_OPTION_CASE_FOLDING, 0);
      
xml_parser_set_option($p, XML_OPTION_SKIP_WHITE, 1);
      
xml_parse_into_struct($p, $raw_xml, $xml, $idx);
      
xml_parser_free($p);
   }
   for(
$i=0;$i<count($xml,1);$i++){
      
// set the current level
      
$level = $xml[$i]['level'];

       if(
$level<1)break;

      
// mark this level's tag in the array
      
$keys[$level] = '['.$i.']';
      
      
// if we've come down a level, sort output and destroy the upper level
      
if(count($keys)>$level) unset($keys[count($keys)]);

      
// ignore close tags, they're useless
      
if($xml[$i]['type']=="open" || $xml[$i]['type']=="complete"){

          
// build the evalstring
          
$e = '$output'.implode('[\'_children\']',$keys);

          
// set the tag name
          
eval($e.'[\'_name\'] = $xml[$i][\'tag\'];');

          
// set the attributes
          
if($xml[$i]['attributes']){
               eval(
$e.'[\'_attributes\'] = $xml[$i][\'attributes\'];');
           }
          
          
// set the value
          
if($xml[$i]['value']){
               eval(
$e.'[\'_value\'] = trim($xml[$i][\'value\']);');
           }

       }

   }

   return
$output;
}
?>
info at gramba dot tv
23-Aug-2005 11:54
I was working with the xml2array functions below and had big performance problems. I fired them on a 20MB XML file and had to quit since all approaches of parsing where just too slow (more than 20 Minute parsing etc..). The solution was parsing it manually with preg_match, which increased performance by more than 20 times (processing time about 1 minute).

Rough example function with high performance:

<?php

function customXMLtoARRAY($xmlstring) {

  
// get all nodes
  
preg_match_all("#<node>(.*?)</node>#s",$xmlstring,$nodes);
  
$xmlstring = NULL;

  
$allnodes = array();

  
// put subnodes into node
  
while($nodes = array_pop($nodes[1])) {

      
$nodecontent = array();

      
// Content1
      
preg_match("#<content1>(.*?)</content1>#",$eventreihe,$val);
      
$nodecontent['content1'] = $val[1];
      
      
$allnodes[] = $nodecontent;
      
   }
   return
$allnodes;
}

?>
Chris Hester
28-Jul-2005 08:06
The array generated from XML stores not only the elements but also any spaces and linebreaks between the tags. This results in a much longer array. (I had 24 array fields instead of 10!) To cure this use the following code when creating the parser:

<?php
$xml_parser
= xml_parser_create();
xml_parser_set_option($xml_parser,XML_OPTION_SKIP_WHITE,1);
?>
grusin at gmail dot com
20-Jul-2005 02:54
Here is a simple patch to peter's xml2array function.

Replace:

<?
case 'open':
$tag_or_id = (array_key_exists ('attributes', xml_elem)) ? $xml_elem['attributes']['ID'] : $xml_elem['tag'];
$ptrs[$level][$tag_or_id] = array ();
$ptrs[$level+1] = & $ptrs[$level][$tag_or_id];
break;
?>

with:

<?
case 'open':
$tag_or_id = (array_key_exists ('attributes', $xml_elem)) ? $xml_elem['attributes']['ID'] : $xml_elem['tag'];
$ptrs[$level][$tag_or_id][] = array ();
$ptrs[$level+1] = & $ptrs[$level][$tag_or_id][count($ptrs[$level][$tag_or_id])-1];
break;
?>

and now code should handle multiple element case :)
Dustin
13-Jul-2005 11:39
If you happen to have problems with weird charaters, I added this code to the first line of dUDA's function:

$XML = utf8_decode($XML);
PhF at madmac dot fr
24-Jun-2005 04:11
The code previously posted by noob at noobsrule dot com doesn't work when the same tag name is used at different levels.
(but perhaps "$php_used_prefix" was intended for that ?)
For example:
<RatedShipment>
  <TransportationCharges>
   ...
  </TransportationCharges>
  <RatedPackage>
   <TransportationCharges>
     ...
   </TransportationCharges>
  </RatedPackage>
</RatedShipment>
<?php
   $p
= xml_parser_create();
  
xml_parser_set_option($p, XML_OPTION_CASE_FOLDING, 0);