A blog about designing and developing Business Intelligence solutions using Microsoft SQL Server
Header

Transact-SQL IN

August 20th, 2012 | Posted by David Stewart in T-SQL - (0 Comments)

The T-SQL IN operator can be used to test whether a given value matches any item in a particular list or sub-query. If the given value equals an item in the list or sub-query the expression returns true, otherwise it returns false.

The following example (based on AdventureWorks) uses IN to return all the departments which are in the research and development, or quality assurance groups. Below is the full list of departments:

DepartmentID Name                                               GroupName                                          ModifiedDate
------------ -------------------------------------------------- -------------------------------------------------- -----------------------
1            Engineering                                        Research and Development                           1998-06-01 00:00:00.000
2            Tool Design                                        Research and Development                           1998-06-01 00:00:00.000
3            Sales                                              Sales and Marketing                                1998-06-01 00:00:00.000
4            Marketing                                          Sales and Marketing                                1998-06-01 00:00:00.000
5            Purchasing                                         Inventory Management                               1998-06-01 00:00:00.000
6            Research and Development                           Research and Development                           1998-06-01 00:00:00.000
7            Production                                         Manufacturing                                      1998-06-01 00:00:00.000
8            Production Control                                 Manufacturing                                      1998-06-01 00:00:00.000
9            Human Resources                                    Executive General and Administration               1998-06-01 00:00:00.000
10           Finance                                            Executive General and Administration               1998-06-01 00:00:00.000
11           Information Services                               Executive General and Administration               1998-06-01 00:00:00.000
12           Document Control                                   Quality Assurance                                  1998-06-01 00:00:00.000
13           Quality Assurance                                  Quality Assurance                                  1998-06-01 00:00:00.000
14           Facilities and Maintenance                         Executive General and Administration               1998-06-01 00:00:00.000
15           Shipping and Receiving                             Inventory Management                               1998-06-01 00:00:00.000
16           Executive                                          Executive General and Administration               1998-06-01 00:00:00.000

(16 row(s) affected)

Now to get only the departments that are in the groups research and development or quality assurance. We need to use the IN operator in the WHERE clause to restrict the rows returned:

1
2
3
4
5
6
7
8
9
SELECT
   *
FROM
   HumanResources.Department
WHERE
   GroupName IN (
      'Research and Development'
   ,  'Quality Assurance'
   )

This gives us the following result:

DepartmentID Name                                               GroupName                                          ModifiedDate
------------ -------------------------------------------------- -------------------------------------------------- -----------------------
1            Engineering                                        Research and Development                           1998-06-01 00:00:00.000
2            Tool Design                                        Research and Development                           1998-06-01 00:00:00.000
6            Research and Development                           Research and Development                           1998-06-01 00:00:00.000
12           Document Control                                   Quality Assurance                                  1998-06-01 00:00:00.000
13           Quality Assurance                                  Quality Assurance                                  1998-06-01 00:00:00.000

(5 row(s) affected)

We can also include NOT with the IN operator, which logically returns false when the value matches an item in the list and true otherwise. So we could get all departments that aren’t in the above groups:

1
2
3
4
5
6
7
8
9
SELECT
   *
FROM
   HumanResources.Department
WHERE
   GroupName NOT IN (
      'Research and Development'
   ,  'Quality Assurance'
   )
DepartmentID Name                                               GroupName                                          ModifiedDate
------------ -------------------------------------------------- -------------------------------------------------- -----------------------
3            Sales                                              Sales and Marketing                                1998-06-01 00:00:00.000
4            Marketing                                          Sales and Marketing                                1998-06-01 00:00:00.000
5            Purchasing                                         Inventory Management                               1998-06-01 00:00:00.000
7            Production                                         Manufacturing                                      1998-06-01 00:00:00.000
8            Production Control                                 Manufacturing                                      1998-06-01 00:00:00.000
9            Human Resources                                    Executive General and Administration               1998-06-01 00:00:00.000
10           Finance                                            Executive General and Administration               1998-06-01 00:00:00.000
11           Information Services                               Executive General and Administration               1998-06-01 00:00:00.000
14           Facilities and Maintenance                         Executive General and Administration               1998-06-01 00:00:00.000
15           Shipping and Receiving                             Inventory Management                               1998-06-01 00:00:00.000
16           Executive                                          Executive General and Administration               1998-06-01 00:00:00.000

(11 row(s) affected)

We can also use sub-queries with the IN operator. For example we might want to get a list of employee id’s that have been assigned to a department within the Quality Assurance group:

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
   EmployeeID
FROM
   HumanResources.EmployeeDepartmentHistory
WHERE
   DepartmentID IN (
      SELECT
         DepartmentID
      FROM
         HumanResources.Department
      WHERE
         GroupName = 'Quality Assurance'
   )

I saw a thread on MSDN Forums recently asking for T-SQL to get a list of the top 10 most frequent words in a given string. While this could be done using cursors, I have a general dislike of them! It got me interested in whether or not this could be achieved using a common table expression…

So I had a think and put together some T-SQL, with some success (it didn’t have a syntax error or run forever). After a few changes I had broken the string up into a table of words, with a bit to indicate whether or not each record is a valid word. In this case, I just defined a word as a string of characters containing only a-z or A-Z, followed by any other character (or no character). This could easily be adjusted to include other characters though, for example ‘ to allow we’re, haven’t, etc.

Essentially the logic behind the CTE is…

  1. You have two integers a and b – one indicating the start of the current word (a), the other the possible length of the word (b).
  2. If the character at a isn’t an alphabetic character, then this isn’t the start of a word. So we need to move a to the next character.
  3. If the character at a + b (one character after the end of the possible current word) isn’t an alphabetic character then we’ve found a word. So we should mark the string starting at a of length b as a word. In this case we set a = a + b + 1, and reset b = 1. If a + b is an alphabetic character, then we haven’t reached the end of the word yet and we need to increment b by 1.

Below is the T-SQL I used, applied over this post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
DECLARE @Text var char(max) = ''; -- remove space from var char

-- A - Z = 65 = 90
-- a - z = 97 = 122

WITH
Chars AS (

SELECT
   1 AS a_in
,  1 AS b_in
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, 1, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, 1, 1)) NOT BETWEEN 97 AND 122
      THEN
         2
      ELSE
         1
   END AS a
,  1 AS b
,  SUBSTRING(@Text, 1, 0) AS Word
,  0 AS IsWord

UNION ALL

SELECT
   a AS a_in
,  b AS b_in
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
      THEN
         a + 1
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      THEN
         a
      ELSE
         a + 1 + b
   END AS a
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
      THEN
         b
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      THEN
         b + 1
      ELSE
         1
   END AS b
,  SUBSTRING(@Text, a, b) AS Word
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      OR (
            ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
         AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
         )
      THEN
         0
      ELSE
         1
   END AS IsWord
FROM
   Chars
WHERE
   a < LEN(@Text)

)

SELECT TOP 10
   Word
,  COUNT(*) AS NumberOfTimesUsed
FROM
   Chars
WHERE
   IsWord = 1
GROUP BY
   Word
ORDER BY
   2 DESC
OPTION
   (MAXRECURSION 0)
…and a screen shot of the ten most frequently used words. Apparently I like short words…
Top 10 Most Frequent Words