We start by creating a table that will hold the special characters and the equivalent text character, for instance, é is equivalent to e.
CREATE TABLE dbo.tblSpecialCharacters (SeqId int identity(1,1),SpecialCharacter nvarchar(50),TextCharacter nvarchar(50))
INSERT INTO dbo.tblSpecialCharacters (SpecialCharacter,TextCharacter)
VALUES ('é','e')
INSERT INTO dbo.tblSpecialCharacters (SpecialCharacter,TextCharacter)
VALUES ('ë','e')
INSERT INTO dbo.tblSpecialCharacters (SpecialCharacter,TextCharacter)
VALUES ('ö','o')
INSERT INTO dbo.tblSpecialCharacters (SpecialCharacter,TextCharacter)
VALUES ('ü','u')
-- Create a Tally table that we are going to use to loop through the names with special characters.
-- For this exmaple we are just going to end at 200, you can however, go as far as 20000 or more
-- depending on your needs.
SELECT TOP 100 IDENTITY(INT,1,1) AS N INTO dbo.tblTally FROM Master.dbo.SysColumns sc1,Master.dbo.SysColumns sc2
--- Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
Now create a function that will be removing the special characters:
CREATE FUNCTION [dbo].[RemoveCharacter]
(
@name nvarchar(200)
)
RETURNS nvarchar(200)
AS
BEGIN
declare @numCharacters int
-- Get the number of characters that the name has
set @numCharacters = (select count(s.SpecialCharacter)
FROM SpecialCharacters s,Tally t
WHERE t.n <= len(@name)
and s.SpecialCharacter=substring(@name,t.n,1))
declare @count int
-- Set counter to 1
set @count = 1
-- Loop through the number of characters that needs to be
-- replaced
while @count<=@numCharacters
begin
-- Replace the character with the equivalent text.
-- Make sure you select only one character at a time. This way the subquery does't return more than
-- one record set.
set @name = replace(@name,(SELECT TOP 1(s.SpecialCharacter) FROM SpecialCharacters s,Tally t WHERE t.n <= len(@name) and s.SpecialCharacter=substring(@name,t.n,1)),
(SELECT TOP 1(s.TextCharacter) FROM SpecialCharacters s,Tally t WHERE t.n <= len(@name) and s.SpecialCharacter=substring(@name,t.n,1)))
set @count = @count + 1
end
RETURN
-- return the name without the characters.
@name
END
Now lets say you have a table of people, in this example we will use a temporary table of employees who have special characters as follows:
CREATE TABLE #tblEmployees (SeqId int identity(1,1), Firstname nvarchar(200), Surname nvarchar(200))
INSERT INTO #tblEmployees (Firstname,Surname)
VALUES ('Tinashe','Zeélie')
INSERT INTO #tblEmployees (Firstname,Surname)
VALUES ('Takura','Wentzël')
INSERT INTO #tblEmployees (Firstname,Surname)
VALUES ('Tafadzwa','Oösthüizen')
INSERT INTO #tblEmployees (Firstname,Surname)
VALUES ('Peter','Oosthuÿsen')
SELECT FIRSTNAME,
dbo.RemoveCharacter(SURNAME) AS SURNAME
FROM #tblEmployees
DROP TABLE #tblEmployees
Results:
Firstname | Surname |
Tinashe | Zeelie |
Takura | Wentzel |
Tafadzwa | Oosthuizen |
Peter | Oosthuysen |
From the results, the characters are removed. This is just a small record set. You can test to see how the function fares, when lets say you have to pass through more than 10000 records of employees. Or that you have to do an inner join with another table, for instance the salary table.
This is just one of the ways you can remove special characters in SQL, I am sure there are other ways this can be achieved. If you have any additions, subtractions or suggestions please do let me know. Hope this helps someone.
No comments:
Post a Comment