Real life use of these scripts example:
Let's assume that we have an application(or a spreadsheet) tracking people annual income but the income column data type is varchar or text, and users may enter these:
100000.00
$100000
$100000
$100000/yr
$ 100000/annual
If we try to convert the values above(except the first one) to a decimal or money, the conversion will fail. Our goal is to remove all character accept character 0 to 9 and the period(.). After that we can convert it to a different data type such decimal or money.
Notice that I use the same technique on the example in this old post, Replace Function with Recursive CTE
--Script 1: replace a range of characters for a single string. DECLARE @original_str VARCHAR(50) = '$100000.00/yr' , @characters_range_str VARCHAR(50) = '[^0-9.]'--regex range, any character except for number 0 to 9 and period (.) , @replacement_str VARCHAR(10) = ''; -- an empty string. Note: if use CHAR, recursion will max out. I don't know why. I will do research and let you know. ; WITH T AS (--recursive cte --anchor member SELECT @original_str AS string , row_id = 1 UNION ALL --Convert to VARCHAR(50) for data type and length to match with anchor member SELECT CAST(REPLACE(string, SUBSTRING(string, PATINDEX('%'+@characters_range_str+'%', string), 1), @replacement_str) AS VARCHAR(50)) , row_id = row_id + 1 FROM T WHERE PATINDEX('%'+@characters_range_str+'%', string) <> 0 --PATINDEX return 1 when match is found ) SELECT string FROM T ORDER BY row_id DESC OFFSET 0 ROW FETCH NEXT 1 ROW ONLY --get final result from cte above --OPTION(MAXRECURSION 0); --change recursion levels. default = 100. Use 0 for max limit = 32767.
--Script 2: Replace a range of characters for a table --temp table to hold test data CREATE TABLE #salary ( empid INT IDENTITY(1,1) PRIMARY KEY, income VARCHAR(20) ); GO --insert some rows for testing INSERT INTO #salary (income) VALUES ('80000.00') ,('$90000') ,('$110000/year') ,('$200000/yr') ,('$ 150000 per year'); GO --remove all characters except character 0 to 9. DECLARE @characters_range_str VARCHAR(10) = '[^0-9.]' --regular expression character range, any character that not number 0 to 9 and period (.) , @replacement_str VARCHAR(10) = ''; -- an empty string. Note: if use CHAR, recursion will max out. I don't know why. I will do research and let you know. ; WITH T AS ( --recursive cte --anchor member SELECT empid , income , 1 AS row_id -- will be used to identify final output. FROM #salary UNION ALL SELECT empid --Convert to VARCHAR(20) for data type and length to match with anchor member , CAST(REPLACE(income, SUBSTRING(income,PATINDEX('%'+@characters_range_str+'%', income),1), @replacement_str) AS VARCHAR(20)) , row_id + 1 FROM T WHERE PATINDEX('%'+@characters_range_str+'%', income) <> 0 --PATINDEX returns 1 When match is found ) , T2 AS (--rank the rowset from cte above to get final results SELECT empid , income --rank the row in descending order to get the result , ROW_NUMBER() OVER(PARTITION BY empid ORDER BY row_id DESC) AS result_row_id FROM T ) SELECT empid, income FROM T2 WHERE result_row_id = 1 ORDER BY empid --OPTION(MAXRECURSION 0); --change recursion levels. default = 100. Use 0 for max limit = 32767. --drop temp table DROP TABLE #salary;
Note that if the data type for the variable @replacement_str is CHAR, the queries above yield this error:
Msg 530, Level 16, State 1, Line 5
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
I don't know why this happens. I think because Replace function returns nvarchar or varchar, thus to avoid implicit casting, all arguments need to be either nvarchar or varchar. If you know the reason why this error occurs, please share. Anyway, please let me know if you have other way to implement this.
Reference:
PATINDEX function: http://technet.microsoft.com/en-us/library/ms188395.aspx
No comments:
Post a Comment