Sunday, May 11, 2014

Shredding XML using OPENXML: Convert XML to Rowset part I

After learning about XPath expression, today we'll use what we learned in previous posts. The examples below show converting XML document to tabular format. This process is sometimes called shredding XML. We'll use the same sample XML document as last post with some modifications.

Here's the OPENXML function syntax:
  OPENXML(idoc int [in], rowpattern nvarchar [in], [flags byte [in]])
  [WITH (ColumnName ColumnType [ColumnPattern | MetaProperty] | TableName)]

Arguments
  idoc:            Document handle created by sys.sp_xml_preparedocument stored procedure
  rowpattern: XPath expression to identify the nodes.
  flags:           Is optional.  This flag indicates what type of nodes we want to retrieve from the XML document, attributes only, elements only, or both.
    byte value 0 = defaults. Select attributes.
    byte value 1 = select attributes. If combined with XML elements, select attributes first then elements.
    byte value 2 = select elements. If combined with XML attributes, select attributes first then elements.
    byte value 3 = select attributes and elements.
    byte value 8 = this flag's value is used to combine with byte value 0 and 2.

Note: the column pattern in the WITH clause will override the 3rd parameters in OPENXML.

Here's a picture provides overview of the process of converting xml to table format.



















Image source: Microsoft Technet.


-- step 1:
-- prepare XML document
DECLARE @XmlDoc XML =
N'<bookstore specialty="novel">
  <book style="autobiography">
    <title>Create a Vision</title>
    <author>
      <first-name>Joe</first-name>
      <last-name>Smith</last-name>
    </author>
    <price>12</price>
  </book>
  <book style="textbook">
    <title>XML Reference</title>
    <author first-name="Mary" last-name="Jane" />
    <price>55</price>
  </book>
  <book style="novel" id="myfave">
    <title>Paths to Success</title>
    <author>
      <first-name>Toni</first-name>
      <last-name>Wilson</last-name>
    </author>
    <price intl="Canada" exchange="0.7">6.50</price>
  </book>
</bookstore>';
--
--An integer to point to the internal representation(XML Document Object Model) of the XML document above.
DECLARE @DocHandle INT; 

--parse the XML document to XML DOM tree representation and returns a handle
EXECUTE sys.sp_xml_preparedocument @DocHandle OUTPUT, @XmlDoc;

--step 2: convert XML to grid format using OPENXML function.
--when not using the WITH clause, the result is an Edge Table Format
SELECT *
FROM OPENXML(@DocHandle, '/bookstore');













The following examples, we will convert the authors first-names to tabular format.
--
----------------------------------------------------
-- All these 3 queries return the same result set --
----------------------------------------------------
-- when the 3rd parameter of OPENXML is missing, 
--the default mapping is used. Select the attributes only 
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author')
WITH("first-name" VARCHAR(20));

--3rd parameter of OPENXML = 0,
--Select the attributes only 
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 0)
WITH("first-name" VARCHAR(20));

--3rd parameter of OPENXML = 1, select the attributes only 
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 1)
WITH("first-name" VARCHAR(20));






--3rd parameter of OPENXML = 2, 
--select the elements only 
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 2)
WITH("first-name" VARCHAR(20)); 






--3rd parameter of OPENXML = 3, 
--select both attributes and elements  
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 3)
WITH("first-name" VARCHAR(20)); 






--
--3rd parameter of OPENXML = 9 (this is flag value 8 + flag value 1,
-- result the same as using flag value 0 or 1), select attributes only  
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 9)
WITH("first-name" VARCHAR(20));






--
--3rd parameter of OPENXML = 10(this is flag value 8 + flag value 2, 
-- result the same as using flag value 2), select elements only.
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 10)
WITH("first-name" VARCHAR(20)); 






--
--3rd parameter of OPENXML = 11 (this is flag value 8 + flag value 3,
-- result the same as using flag value 3), select both attributes and elements. 
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 11)
WITH("first-name" VARCHAR(20));




--
--3rd parameter of OPENXML not in (0, 1, 2, 3, 9, 10, 11), 
-- default is used -- select only attributes.  
SELECT * 
FROM OPENXML(@DocHandle, '/bookstore/book/author', 444)
WITH("first-name" VARCHAR(20));





--
-- Final step: Removes the internal representation
--  and invalidate the document handle.
EXECUTE sys.sp_xml_removedocument @DocHandle;


For all queries above, the column pattern in the WITH clause is not needed because the column name(first-name) is the same as first-name attribute and first-name element in the XML document above.
That's all for now. I don't want to make this post too long. I'll provide more examples next 2 posts.

Reference:
OPENXML
OPENXML Function
OPENXML Examples
sp_xml_preparedocument
sp_xml_removedocument
XML Edge Table
XML Document Object Model(XML DOM)

No comments:

Post a Comment