Character Scrubbing in T-SQL

Author:JonK
Last Updated:November 15, 2016 10:52 AM

Feel free to repurpose this as needed (create a function for it)

DECLARE @original NVARCHAR(MAX) = N'';
DECLARE @cleaned NVARCHAR(MAX) = N'';
WITH LowAscii AS
(
    SELECT CASE WHEN ASCII(SUBSTRING(@original, 1, 1)) > 127 
                THEN ''
                WHEN PATINDEX('%[^A-Z0-9 _-]%', SUBSTRING(@original, 1, 1)) > 0
                THEN ''
                ELSE SUBSTRING(@original, 1, 1) 
            END AS OneChar, 1 AS Idx
    UNION ALL
    SELECT CASE WHEN OneChar = '' AND SUBSTRING(@original, Idx+1, 1) = ' '
                THEN ''
                WHEN ASCII(SUBSTRING(@original, Idx+1, 1)) > 127 
                THEN ''
                WHEN PATINDEX('%[^A-Z0-9 _-]%', SUBSTRING(@original, Idx+1, 1)) > 0
                THEN ''
                ELSE SUBSTRING(@original, Idx+1, 1)
                END, Idx+1
        FROM LowAscii
    WHERE Idx < LEN(@original)
)
SELECT @cleaned = @cleaned + OneChar from LowAscii Option (MaxRecursion 32767);

Items of interest:

  • The CTE (common table expression) works great in situations that require recursion. Here we're recursively calling ourself in order to parse individual characters in a string.
  • PATINDEX is the way you perform RegEx-ish LIKE queries.
top