Wednesday, July 16, 2014

XQuery: Aggregration With Multiple Levels

On the last post, the examples aggregate data with single level(group by customers). Today we'll explore how to aggregate data with multiple levels in XQuery.
For the example below, we'll use the same orders.xml document.

USE TEMPDB;
GO

-- load xml files to SQL Server as CLOB = Character Large Object 
DECLARE @orders XML =
(SELECT * 
 FROM OPENROWSET(BULK 'C:\Self_Dev\XML\orders2.xml', SINGLE_CLOB) AS orders
);
--Aggregation with multiple levels.
--Group by region, city
SELECT @orders.query(
   ' (: The 1st iteration get distinct regions.
        The 2nd iteration get distinct city for each region.
        The let clause get the orders that match with each region/city combination. :)
    for $region in distinct-values(//orders/order/shipaddress/region)
    for $city   in distinct-values(//orders/order/shipaddress[region=$region]/city)
    let $orderbyRegionCity := //orders/order[shipaddress/region = $region and shipaddress/city = $city]
    return
    <orders_summary>
       <region>{$region}</region>
       <city>{$city}</city>
       <ordercount>{count($orderbyRegionCity)}</ordercount>
       <total_amount>{sum($orderbyRegionCity/subtotal)}</total_amount>
       <min_amount>{min($orderbyRegionCity/subtotal)}</min_amount>
       <max_amount>{max($orderbyRegionCity/subtotal)}</max_amount>
    </orders_summary>
    ');

<orders_summary>
  <region>WA</region>
  <city>Seattle</city>
  <ordercount>1</ordercount>
  <total_amount>10</total_amount>
  <min_amount>10</min_amount>
  <max_amount>10</max_amount>
</orders_summary>
<orders_summary>
  <region>OR</region>
  <city>Portland</city>
  <ordercount>3</ordercount>
  <total_amount>100</total_amount>
  <min_amount>10</min_amount>
  <max_amount>50</max_amount>
</orders_summary>
--Aggregation with multiple levels.
--This query is the same as above, but region and city 
--   are constructed as attributes for the output.
--Group by region, city
SELECT @orders.query(
   ' (: The 1st iteration get distinct regions.
        The 2nd iteration get distinct city for each region.
        The let clause get the orders that match with each region/city combination. :)
    for $region in distinct-values(//orders/order/shipaddress/region)
    for $city   in distinct-values(//orders/order/shipaddress[region=$region]/city)
    let $orderbyRegionCity := //orders/order[shipaddress/region = $region and shipaddress/city = $city]
    return
    <orders_summary region="{$region}" city="{$city}">
       <ordercount>{count($orderbyRegionCity)}</ordercount>
       <total_amount>{sum($orderbyRegionCity/subtotal)}</total_amount>
       <min_amount>{min($orderbyRegionCity/subtotal)}</min_amount>
       <max_amount>{max($orderbyRegionCity/subtotal)}</max_amount>
    </orders_summary>
    ');
<orders_summary region="WA" city="Seattle">
  <ordercount>1</ordercount>
  <total_amount>10</total_amount>
  <min_amount>10</min_amount>
  <max_amount>10</max_amount>
</orders_summary>
<orders_summary region="OR" city="Portland">
  <ordercount>3</ordercount>
  <total_amount>100</total_amount>
  <min_amount>10</min_amount>
  <max_amount>50</max_amount>
</orders_summary>

Unlike SQL language, currently XQuery doesn't have a GROUP BY clause.  As of XQuery 3.0 Recommendation, XQuery does have a group by clause.

Tuesday, July 15, 2014

XQuery: Aggregation with Join

Today we'll learn about aggregation using XQuery FLWOR expressions.
Assume that we have a customers and an orders XML document. The examples below demonstrate grouping, aggregation and join in XQuery.

-- load xml files to SQL Server as CLOB = Character Large Object 
DECLARE @customers XML = 
(SELECT * FROM OPENROWSET(BULK 'C:\Self_Dev\XML\customers.xml', SINGLE_CLOB) AS customers);

DECLARE @orders XML =
(SELECT * FROM OPENROWSET(BULK 'C:\Self_Dev\XML\orders2.xml', SINGLE_CLOB) AS orders);

-- Concatenate XML documents using XML PATH mode.
DECLARE @customers_orders XML = (SELECT @customers, @orders
                                 FOR XML PATH(''));
--Aggregation without join.
--count # of order for each customer (group by customer).
SELECT @customers_orders.query(
  'for $custid in distinct-values(//orders/order/custid)
   let $o := //orders/order[custid = $custid]
   return 
   <customer_order>
     <custid>{$custid}</custid>
     <ordercount>{count($o)}</ordercount>
   </customer_order>
   ');

<customer_order>
  <custid>1</custid>
  <ordercount>1</ordercount>
</customer_order>
<customer_order>
  <custid>2</custid>
  <ordercount>3</ordercount>
</customer_order>

--aggregation with inner join.
--count # of order for each customers and list the customer name.
SELECT @customers_orders.query(
    'for $custid in distinct-values(//orders/order/custid)
     let $o := //orders/order[custid = $custid]
     let $c := //customers/customer[@custid = $custid]
     return
     <customer_order>
       <custid>{$custid}</custid>
       {$c/first-name}
       {$c/last-name}
       <ordercount>{count($o)}</ordercount>
     </customer_order>
    ');
<customer_order>
  <custid>1</custid>
  <first-name>Bobby</first-name>
  <last-name>Nguyen</last-name>
  <ordercount>1</ordercount>
</customer_order>
<customer_order>
  <custid>2</custid>
  <first-name>SonHa</first-name>
  <last-name>Pham</last-name>
  <ordercount>3</ordercount>
</customer_order>

--Aggregation with outer join.
--count # of order for each customers and list the customer name,
-- and include customer didn't have any order.
SELECT @customers_orders.query(
    'for $c in //customers/customer
     let $o := //orders/order[custid = $c/@custid]
     return
     <customer_order>
       {$c/first-name}
       {$c/last-name}
       <ordercount>{count($o)}</ordercount>
     </customer_order>
    ');
<customer_order>
  <first-name>Bobby</first-name>
  <last-name>Nguyen</last-name>
  <ordercount>1</ordercount>
</customer_order>
<customer_order>
  <first-name>SonHa</first-name>
  <last-name>Pham</last-name>
  <ordercount>3</ordercount>
</customer_order>
<customer_order>
  <first-name>Ngan</first-name>
  <last-name>Le</last-name>
  <ordercount>0</ordercount>
</customer_order>

--aggregation with outer join.
--count # of order, and sum the orders amount
-- for each customers and list the customer name,
--  and include customer didn't have any order.
SELECT @customers_orders.query(
    'for $c in //customers/customer
     let $o := //orders/order[custid = $c/@custid]
     return
     <customer_order>
       {$c/first-name}
       {$c/last-name}
       <ordercount>{count($o)}</ordercount>
       <totalordersamount>{sum($o/subtotal)}</totalordersamount>
     </customer_order>
    ');

<customer_order>
  <first-name>Bobby</first-name>
  <last-name>Nguyen</last-name>
  <ordercount>1</ordercount>
  <totalordersamount>10</totalordersamount>
</customer_order>
<customer_order>
  <first-name>SonHa</first-name>
  <last-name>Pham</last-name>
  <ordercount>3</ordercount>
  <totalordersamount>100</totalordersamount>
</customer_order>
<customer_order>
  <first-name>Ngan</first-name>
  <last-name>Le</last-name>
  <ordercount>0</ordercount>
  <totalordersamount>0.0E0</totalordersamount>
</customer_order>

Thursday, July 3, 2014

Outer Join XML Data Using XQuery

Today we'll learn about outer join in XQuery.
--
-- assume that we have 2 XML documents, customers orders.
-- load xml files to SQL Server as CLOB = Character Large Object 
DECLARE @customers XML = 
(SELECT * FROM OPENROWSET(BULK 'C:\Self_Dev\XML\customers.xml', SINGLE_CLOB) AS customers);

DECLARE @orders XML =
(SELECT * FROM OPENROWSET(BULK 'C:\Self_Dev\XML\orders.xml', SINGLE_CLOB) AS orders);

-- Concatenate XML documents using XML PATH mode.
DECLARE @customers_orders XML = (SELECT @customers, @orders
                                 FOR XML PATH(''));
SELECT @customers_orders;
Here's the XML data that we'll use for the examples below.
--outer join
-- all customers regardless if they have an order or not
SELECT @customers_orders.query(
       'for $c in //customers/customer
       let $o := //orders/order[custid = $c/@custid]
       return
       <order>
         <custid>{$c/@custid}</custid>
         {$c/first-name}
         {$c/last-name}
         <orderid>{$o/@orderid}</orderid>
         {$o/orderdate}
       </order>');
The result is what we expected. Customer with custid = 3 also is included in the result even she didn't make any order.

-- Notice the result for the query above.
-- The orderdate for custid = 3 is not shown, since she doesn't have any order.
-- What if we want to display a tag <orderdate> with no value?
-- This query provide a tag <orderdate> with no value for custid = 3.
-- However, it also nested the <orderdate> for other customers' order.
SELECT @customers_orders.query(
       'for $c in //customers/customer
       let $o := //orders/order[custid = $c/@custid]
       return
       <order>
         <custid>{$c/@custid}</custid>
         {$c/first-name}
         {$c/last-name}
         <orderid>{$o/@orderid}</orderid>
         <orderdate>{$o/orderdate}</orderdate>
       </order>');
--
-- This query fix the nesting issue by display a tag <orderdate>
--   with no value for customer with custid = 3.
-- Use text() to only return value not tag.
-- Note: data() is not supported in sql server 2012
SELECT @customers_orders.query(
       'for $c in //customers/customer
       let $o := //orders/order[custid = $c/@custid]
       return
       <order>
         <custid>{$c/@custid}</custid>
         {$c/first-name}
         {$c/last-name}
         <orderid>{$o/@orderid}</orderid>
         <orderdate>{$o/orderdate/text()}</orderdate>
       </order>');
--
-- Another option for outer join is include another FLWOR expression inside of the return clause.
SELECT @customers_orders.query(
       'for $c in //customers/customer
       return
       <order>
         <custid>{$c/@custid}</custid>
         {$c/first-name}
         {$c/last-name}
         {
            for $o in //orders/order[custid = $c/@custid]
            return (
               <orderid>{$o/@orderid}</orderid>
                <orderdate>{$o/orderdate/text()}</orderdate>
            )
          }
        </order>');

--
--same as query above but using where clause for join conditions
SELECT @customers_orders.query('for $c in //customers/customer
SELECT @customers_orders.query(
       'for $c in //customers/customer
       return
       <order>
         <custid>{$c/@custid}</custid>
         {$c/first-name}
         {$c/last-name}
         {
            for $o in //orders/order
            where $c/@custid = $o/custid
            return (
               <orderid>{$o/@orderid}</orderid>
                <orderdate>{$o/orderdate/text()}</orderdate>
            )
          }
       </order>');
Note the result for 2 queries above doesn't include <orderid> and <orderdate> tag for customer with custid = 3.