Wednesday, July 16, 2014

XQuery: Aggregration With Multiple Levels

On the last post, the examples aggregate data with single level (group by customers). Today we'll explore how to aggregate data with multiple levels in XQuery.
For the example below, we'll use the same orders.xml document.
<?xml version="1.0"?>
<orders>
  <order orderid="1">
    <custid>1</custid>
    <orderdate>2014-05-15</orderdate>
    <shipaddress>
       <address>343 2nd Ave S.</address>
       <city>Seattle</city>
       <region>WA</region>
       <postalcode>98142</postalcode>
    </shipaddress>
    <items>
       <item itemid="a1">
          <quantity>1</quantity>
          <unitprice>10.00</unitprice>
       </item>
    </items>
    <subtotal>10.00</subtotal>
  </order>
  <order orderid="2">
    <custid>2</custid>
    <orderdate>2014-06-01</orderdate>
    <shipaddress>
       <address>1000 Johnson St.</address>
       <city>Portland</city>
       <region>OR</region>
       <postalcode>97209</postalcode>
    </shipaddress>
    <items>
       <item itemid="a1">
          <quantity>1</quantity>
          <unitprice>10.00</unitprice>
       </item>
       <item itemid="b1">
          <quantity>2</quantity>
          <unitprice>20.00</unitprice>
       </item>
    </items>
    <subtotal>50.00</subtotal>
  </order>
  <order orderid="3">
    <custid>2</custid>
    <orderdate>2014-07-01</orderdate>
    <shipaddress>
       <address>1000 Johnson St.</address>
       <city>Portland</city>
       <region>OR</region>
       <postalcode>97209</postalcode>
    </shipaddress>
    <items>
       <item itemid="a1">
          <quantity>2</quantity>
          <unitprice>10.00</unitprice>
       </item>
       <item itemid="b1">
          <quantity>1</quantity>
          <unitprice>20.00</unitprice>
       </item>
    </items>
    <subtotal>40.00</subtotal>
  </order>
  <order orderid="4">
    <custid>2</custid>
    <orderdate>2014-07-10</orderdate>
    <shipaddress>
       <address>1000 Johnson St.</address>
       <city>Portland</city>
       <region>OR</region>
       <postalcode>97209</postalcode>
    </shipaddress>
    <items>
       <item itemid="a1">
          <quantity>1</quantity>
          <unitprice>10.00</unitprice>
       </item>
    </items>
    <subtotal>10.00</subtotal>
  </order>
</orders>



USE TEMPDB;
GO

-- load xml files to SQL Server as CLOB = Character Large Object 
DECLARE @orders XML =
(SELECT * 
 FROM OPENROWSET(BULK 'C:\Self_Dev\XML\orders2.xml', SINGLE_CLOB) AS orders
);
--Aggregation with multiple levels.
--Group by region, city
SELECT @orders.query(
   ' (: The 1st iteration get distinct regions.
        The 2nd iteration get distinct city for each region.
        The let clause get the orders that match with each region/city combination. :)
    for $region in distinct-values(//orders/order/shipaddress/region)
    for $city   in distinct-values(//orders/order/shipaddress[region=$region]/city)
    let $orderbyRegionCity := //orders/order[shipaddress/region = $region and shipaddress/city = $city]
    return
    <orders_summary>
       <region>{$region}</region>
       <city>{$city}</city>
       <ordercount>{count($orderbyRegionCity)}</ordercount>
       <total_amount>{sum($orderbyRegionCity/subtotal)}</total_amount>
       <min_amount>{min($orderbyRegionCity/subtotal)}</min_amount>
       <max_amount>{max($orderbyRegionCity/subtotal)}</max_amount>
    </orders_summary>
    ');

<orders_summary>
  <region>WA</region>
  <city>Seattle</city>
  <ordercount>1</ordercount>
  <total_amount>10</total_amount>
  <min_amount>10</min_amount>
  <max_amount>10</max_amount>
</orders_summary>
<orders_summary>
  <region>OR</region>
  <city>Portland</city>
  <ordercount>3</ordercount>
  <total_amount>100</total_amount>
  <min_amount>10</min_amount>
  <max_amount>50</max_amount>
</orders_summary>
--Aggregation with multiple levels.
--This query is the same as above, but region and city 
--   are constructed as attributes for the output.
--Group by region, city
SELECT @orders.query(
   ' (: The 1st iteration get distinct regions.
        The 2nd iteration get distinct city for each region.
        The let clause get the orders that match with each region/city combination. :)
    for $region in distinct-values(//orders/order/shipaddress/region)
    for $city   in distinct-values(//orders/order/shipaddress[region=$region]/city)
    let $orderbyRegionCity := //orders/order[shipaddress/region = $region and shipaddress/city = $city]
    return
    <orders_summary region="{$region}" city="{$city}">
       <ordercount>{count($orderbyRegionCity)}</ordercount>
       <total_amount>{sum($orderbyRegionCity/subtotal)}</total_amount>
       <min_amount>{min($orderbyRegionCity/subtotal)}</min_amount>
       <max_amount>{max($orderbyRegionCity/subtotal)}</max_amount>
    </orders_summary>
    ');
<orders_summary region="WA" city="Seattle">
  <ordercount>1</ordercount>
  <total_amount>10</total_amount>
  <min_amount>10</min_amount>
  <max_amount>10</max_amount>
</orders_summary>
<orders_summary region="OR" city="Portland">
  <ordercount>3</ordercount>
  <total_amount>100</total_amount>
  <min_amount>10</min_amount>
  <max_amount>50</max_amount>
</orders_summary>

Unlike SQL language, currently XQuery doesn't have a GROUP BY clause.  As of XQuery 3.0 Recommendation, XQuery does have a group by clause.

2 comments:

  1. Perhaps it's time you catch up to XQuery 3.0

    http://www.w3.org/TR/xquery-30/#id-group-by

    ReplyDelete
    Replies
    1. Thanks for the info. I didn't know this. I should update my blog accordingly.

      Delete