Monday, March 31, 2014

Replace Function With Recursive CTE

Last 2 posts we learned some cool ways to use the STUFF function. This post, we will combine the power of recursive common table expression (CTE), STUFF and PATINDEX function to write our own script to replace a string pattern. Some of you may ask why bother when SQL Sever already has the built-in REPLACE function? Well, this is just for fun and it doesn't hurt to practice writing sql. : )

/***********************************************************************
 Replace a pattern with another string pattern.
 This is almost the same as the built-in TSQL REPLACE function.
 Ex: remove all commas within a string. 
************************************************************************/
--
DECLARE @original_str VARCHAR(100) = 'The, quick, brown, fox, jumps, over, the, lazy, dog'
      , @pattern_str VARCHAR(100)  = ','
      , @replacement_str VARCHAR(100) = '';

 -- recursive CTE: replace @pattern_str with @replacement_str 
; WITH phrase(string, row_id) AS (
   -- anchor member 
   SELECT @original_str AS string, row_id = 1
   UNION ALL
   -- recursive member */
   -- Note: If start_position <= 0, STUFF function returns NULL 
   SELECT
      --Convert to VARCHAR(100) for data type and length to match with anchor member
      -- Case expression to handle NULL and string with no match found
      CAST(CASE WHEN PATINDEX('%'+@pattern_str+'%', string) <= 0 THEN string
             ELSE STUFF(string, PATINDEX('%'+@pattern_str+'%', string), 1,@replacement_str)
            END AS VARCHAR(100)
      ) AS string
      , row_id = row_id + 1
   FROM phrase
   WHERE PATINDEX('%'+@pattern_str+'%', string) > 0 -- stop when pattern is not found
)
SELECT string AS replaced_string
FROM phrase
ORDER BY row_id DESC
OFFSET 0 ROW FETCH NEXT 1 ROW ONLY -- get last output from recursive CTE above
OPTION(MAXRECURSION 0);










If you have a solution please share. I'm sure there're better and simpler solutions than this.
For me, it's fun to manipulate character strings. I will provide more examples on string manipulation on future posts. Also we'll learn more about CTE and recursive CTE.

Saturday, March 29, 2014

TSQL STUFF with XML PATH: Delimit a Series of Strings

Last post, we explored some useful cases where STUFF function can be used. This post we'll take a step further by combining STUFF with XML PATH to combine rows to a single string and separate the values with commas.

Let's remind ourselves the STUFF function syntax:
STUFF ( original_string, start_position , length , replaceWith_string )

 /**************************************************************/
 /* Delimit a string of text. Insert a comma between strings.*/
/**************************************************************/
 DECLARE @LogicalQueryProccessingOrder TABLE (
phase VARCHAR(10)
);

INSERT INTO @LogicalQueryProccessingOrder 
VALUES ('FROM'), ('WHERE'), ('GROUP BY'), ('HAVING'), ('SELECT'), ('ORDER BY');

/* this doesn't work because result has an extra comma at the end. Note this return in xml format */
SELECT phase + ', '
FROM @LogicalQueryProccessingOrder
FOR XML PATH('');








/* this one works */
SELECT STUFF (
       /* 1st parameter: original string */
       (SELECT phase + ', ' FROM @LogicalQueryProccessingOrder FOR XML PATH(''))
       /* 2nd parameter:  start position. This is at last comma position.
        , LEN((SELECT phase + ', ' FROM @LogicalQueryProccessingOrder FOR XML PATH('')))
       /* 3rd parameter: length: number of characters to be replace. This will delete the extra comma at the end. */
      , 1
       /* 4th parameter: string to be replace with. An empty string will act like an eraser. This will replace the last comma with an empty string */
    , '') AS LogicalQueryOrder; 








/* same result here */
SELECT STUFF((SELECT ', ' + phase FROM @LogicalQueryProccessingOrder FOR XML PATH('')) 
      , 1 
     , 1 
      ,'') AS LogicalQueryOrder; 








Friday, March 28, 2014

TSQL Handy STUFF Function


STUFF ( original_string, start_position , length , replaceWith_string )
Insert a string into another.
Start_position: position where the replace starts on the originalString
Length: # of character to be replaced(deleted) on the originalString
If start_position <= 0, then return NULL
If length < 0, then returns NULL
If start_position >  length of originalString, then returns NULL
If length >= length of originalString, will be deleted to 1st character of originalString.

/***************************************/
/* STUFF function rules with examples */
/***************************************/
DECLARE @original_str VARCHAR(50) = 'Hello World',/* length = 11 */ 
        @replaceWith_str VARCHAR(50) = 'ABC';
SELECT 
  -- If start_position <= 0, then return NULL 
  STUFF(@original_str, -1, 1, @replaceWith_str) AS output1, 
  -- If length < 0, then returns NULL  
  STUFF(@original_str, 1, -1, @replaceWith_str) AS output2,
  -- If length = 0, then returns @replaceWith_str will be inserted
  --      between start_position and start_position + 1 
  STUFF(@original_str, 2, 0, @replaceWith_str) AS output3, 
  -- If start_position >  length of originalString, then returns NULL 
  STUFF(@original_str, LEN(@original_str) + 1, 1, @replaceWith_str) AS output4, 
  -- If length >= length of originalString, will be deleted to 1st character of originalString.  
  STUFF(@original_str, 1, LEN(@original_str) + 1, @replaceWith_str) AS output5, 
  -- Notice that even though length of @replaceWith_str < length of @original_str. Does not matter because there's 10 character to be deleted from the @original_str 
  STUFF(@original_str, 1, 10, @replaceWith_str) AS output6,
  -- if replace with string is empty, this act as an earaser. This will delete the first 2 character of the original string  
  STUFF(@original_str, 1, 2, '') AS output7;
/*************************************************/
/* Mask a SSN. Format 123889999 to 123-88-9999 */
/************************************************/
DECLARE @SSN CHAR(9) = '123889999';
SELECT STUFF(STUFF(@SSN, 4, 0,'-'), 7, 0, '-') AS SSN
GO
/***************************************************/
/* Mask a SSN. Format 123889999 to XXXXX9999 */
/*************************************************/
DECLARE @SSN CHAR(9) = '123889999';
SELECT STUFF(@SSN, 1, 5, 'XXXXX') AS Last4_SSN
     , STUFF(@SSN, 1, 5, REPLICATE('X',5)) AS Last4_SSN;
/******************************/
/* Concatenate strings    */
/*****************************/
USE AdventureWorks2012;
GO
SELECT TOP(5) FirstName, LastName
, STUFF(FirstName + ' ', LEN(FirstName) + 1, 0, ' ' + LastName) AS Name
FROM PERSON.PERSON
ORDER BY FirstName, LastName;

Thursday, March 27, 2014

TSQL String Concatenation with NULL and CONCAT Function


String Concatenation with plus (+) operator
Example: the result of this query will return NULL because @mname is NULL. 
Note: NULL mark represent missing data.
Thus known data + missing data = unknown

DECLARE @lname VARCHAR(20) = 'x', @fname VARCHAR(20) = 'y', @mname VARCHAR(20);
SELECT @lname + ', ' + @fname + ' ' + @mname AS fullName;







Example: this query result an error because can't convert 'x, y' string to an TINYINT.
DECLARE
@lname VARCHAR(20) = 'x', @fname VARCHAR(20) = 'y', @mname VARCHAR(20),
@a_number TINYINT = 2;

SELECT@lname + ', ' + @fname + ' ' + @a_number AS fullName_and_number;
Result: Conversion failed when converting the varchar value 'y, x ' to data type tinyint.






To avoid the error above we can convert number 2 to a string like this:
DECLARE @lname VARCHAR(20) = 'x', @fname VARCHAR(20) = 'y', @mname VARCHAR(20), @a_number TINYINT = 2;

SELECT @lname + ', ' + @fname + ' ' + CAST(@a_number AS VARCHAR(5)) AS fullName_and_number;






String Concatenation with CONCAT Function
Available on SQL Server 2012
Concatenate input parameters to a string
All parameters are implicitly converted to string
NULL parameter are implicitly convert to empty string
Require minimum 2 parameters
Example:

DECLARE @lname VARCHAR(20) = 'x', @fname VARCHAR(20) = 'y', @mname VARCHAR(20), @a_number TINYINT = 2;
SELECT CONCAT(@lname, ', ', @fname, ' ', @mname, @a_number) AS fullName_and_number;



Wednesday, March 26, 2014

70-461 Training Kit Chapter 2, Lesson 1: SELECT

SELECT
Not good practice to use SELECT * because 
     All columns will returned (I'v seen tables with 100+ columns)
     Reduce performance
     Table structure can be changed later.

Good practice to use schema name.
Example: Sales = schema, Orders = table
SELECT TOP(10) orderid, custid FROM Sales.Orders.

Alias
Once the assigned an alias to a table,  the original table name is NOT visible for the duration of the query.
Example: this query will fail with error: The multi-part identifier "Orders.OrderId" could not be bound.
SELECT Orders.OrderId, CustId
FROM Sales.Orders AS O;

Good practice to use keyword AS to name alias - help improve code readability 
SELECT empid, lastname + ', ' + firstname AS FullName
FROM HR.Employees;

Example: this query only return the OrderId column with alias Custid because missing a comma between OrderId and CustId.
SELECT OrderId CustId
FROM Sales.Orders;
  
Alias Forms
<expression> AS <alias>
<expression> <alias>
<alias> = <expression>

Standard SQL require a query to have FROM and SELECT clause
TSQL query can have SELECT clause without FROM clause
Example:
SELECT 2 * 4 AS value;

Regular Identifier rules:
First characters in [a-z] or [A-Z], underscore, at sign, number sign
Can't include TSQL reserved keyword
Can't have space
Can't have supplementary characters??
Example: table name that follow the regular identifier rules.
Employees, Employees2014, _Employees, @Employees, #Employees

Identifier doesn't follow the rules above must be delimited with double quotes(Standard) or brackets (TSQL specific)
Examples:
"Employees this year", "2014 Employees", "2014"

[Employees 2014], [2014Employees], [2014]

Credits: Microsoft 70-461 Training Kit

70-461 Training Kit Chapter 1, Lesson 2: Logical Query Processing Phase


I think this is one the most import concept to master when learning SQL.

Keyed-in order: the syntax order
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

Logical Query Processing (with easy to remember phrase, credit to my co-worker): the order logically query is processed
FROM           -- For
WHERE        -- Wives
GROUP BY   -- Good
HAVING        -- Husbands
SELECT        -- Sing
ORDER BY   -- Often

Logical Query Processing detail
   FROM
Operators: CROSS JOIN, INNER JOIN, OUTER JOIN, CROSS APPLY, OUTER APPLY, PIVOT, UNPIVOT        
  WHERE    
  GROUP BY  
  HAVING      
SELECT
EXPRESSION
DISTINCT  
TOP    
ORDER BY  

Each phase output is served as input for the next phase
For optimization, SQL Server may process the query in any order.

Different between WHERE and HAVING clause:
WHERE evaluated before rows are grouped.
HAVING evaluated after rows are grouped.

SELECT statement:
All-at-once manner, no left to right
That mean this query will fail
SELECT 2*4 AS x, 2*x AS [2x]; --error invalid column x

ORDER BY
Returns a cursor, result is not a set and not relational
Add cost to query

Credits: Microsoft 70-461 Training Kit

Tuesday, March 25, 2014

70-461 Training Kit Chapter 1, Lesson 1: Foundation

SQL
  • based on set theory & predicate logic
  • Declarative language not procedural.
  • Is 3-valued predicate logic: true, false, unknown
Relational Model
  • not equal Relationship
  • based on 2 mathematics branch: set theory and predicate logic
  • All attribute must has a name
  • All attribute names must be unique
  • No ordering for column or row
T-SQL
  • is a dialect extension of SQL base on ISO standard.
  • Based on multiset theory (may include duplicates).
  • Allows column ordering based on the order that column is create/defined
Recommendation
  •  to use standard syntax such as CAST (standard), CONVERT(T-SQL specific)
  • Your code and knowledge more portable.
A Relation:
  • Is a like a Tables in SQL
  • include a heading and a body
  • Heading of a relation is a set of attributes
  • Body of a relation is set or rows(tuples)
Tuples:
    like rows

A Set:
  • consider as a whole (interact with a set as a whole not row by row, RBAR)
  • no order
  • no duplicate
A Predicate:
  • a proposition either true or false
  • for filter
  • enforce data integrity
  • define data model (???)
An attribute
  • identify a name and a type
Relational thinking
  • Avoid thinking iterative (cursor, loop).
  • Think in set.
When use order by clause
  • The result is considered as a cursor, not relational.
NULL
  • Missing
  • NULL doesn't equal blank
  • NULL doesn't equal NULL
  • Need to handle with care.
  • Is a mark for a missing value - not a value itself.
Using correct Terminology

Fields and Records
  • Are physical
  • Field <> column
  • Record <> row
Tables are logical
  • Have logical rows and columns.
Geeks speak: Not NULL value, said NULL mark instead

Credits: Microsoft 70-461 Training Kit

Microsoft exam 70-461 blog Series


I was browsing the Microsoft website to see what subjects are in 70-461 exam, and I stumbled upon this blog yesterday. The blog author is very motivated and eager to learn even he failed the 70-461 exam.  As right now I have no intention taking this exam. I just want to blog about random topics that would be covered in the exam. I've been working with SQL Server and database for 3 years now.  The main reason I want to blog about this exam is I want to strengthen my knowledge and pick up new things along the way.

Also, I promised my previous co-worker, she now lives in San Francisco, that I would starting writing more about database stuff.