A blog about designing and developing Business Intelligence solutions using Microsoft SQL Server
Header

Every now and then I get the requirement to add row numbers to a table in SQL Server Reporting Services. Usually this is simple enough using the row number function - http://msdn.microsoft.com/en-us/library/ms159225(v=SQL.100).aspx. However, typically becomes a lot less straight forward when requirements are changed slightly!

In this case I was developing a monthly report which returned summarised information by account code for input into a finance system and included several detail sections for auditing and reconciliation. The summary table required a sequence number, which reflected the row number of the table. I had no joy scoping the row number function to the table or group, so started thinking about alternative solutions. The report contained multiple sections using the same source query, so returning multiple queries was inefficient and created duplicate code. Given the tablix is grouped by account code, it’s effectively returning a distinct list of account codes. Therefore, the row/sequence number is actually the row number of the account code in a distinct list of the account codes. Once you start looking to utilise this, it’s actually fairly simple to get the desired row numbers.

For this example I created a simple table of sample data containing two columns – Account Code and Amount.

I then created a simple tablix grouped by account code – displaying the account code, amount, and a distinct count of the account code. Looking at the output below it’s pretty easy to recognise that the row number is a cumulative sum of the distinct count of account codes.

Conveniently there’s a function in SSRS to do just that! RunningValue - http://msdn.microsoft.com/en-us/library/ms159136.aspx. Running value accepts three parameters…
expression - the expression or field to aggregate,  in this case the account code field.
function - the aggregate function to apply, in this case count distinct.
scope - the name of the dataset, data region, etc… in this case the tablix.
Having not renamed the tablix, I replaced the distinct count field with the following expression…

1
=RunningValue(Fields!AccountCode.Value, CountDistinct, "Tablix1")

…and there we have it, row numbers in a tablix group.

I saw a thread on MSDN Forums recently asking for T-SQL to get a list of the top 10 most frequent words in a given string. While this could be done using cursors, I have a general dislike of them! It got me interested in whether or not this could be achieved using a common table expression…

So I had a think and put together some T-SQL, with some success (it didn’t have a syntax error or run forever). After a few changes I had broken the string up into a table of words, with a bit to indicate whether or not each record is a valid word. In this case, I just defined a word as a string of characters containing only a-z or A-Z, followed by any other character (or no character). This could easily be adjusted to include other characters though, for example ‘ to allow we’re, haven’t, etc.

Essentially the logic behind the CTE is…

  1. You have two integers a and b – one indicating the start of the current word (a), the other the possible length of the word (b).
  2. If the character at a isn’t an alphabetic character, then this isn’t the start of a word. So we need to move a to the next character.
  3. If the character at a + b (one character after the end of the possible current word) isn’t an alphabetic character then we’ve found a word. So we should mark the string starting at a of length b as a word. In this case we set a = a + b + 1, and reset b = 1. If a + b is an alphabetic character, then we haven’t reached the end of the word yet and we need to increment b by 1.

Below is the T-SQL I used, applied over this post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
DECLARE @Text var char(max) = ''; -- remove space from var char

-- A - Z = 65 = 90
-- a - z = 97 = 122

WITH
Chars AS (

SELECT
   1 AS a_in
,  1 AS b_in
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, 1, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, 1, 1)) NOT BETWEEN 97 AND 122
      THEN
         2
      ELSE
         1
   END AS a
,  1 AS b
,  SUBSTRING(@Text, 1, 0) AS Word
,  0 AS IsWord

UNION ALL

SELECT
   a AS a_in
,  b AS b_in
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
      THEN
         a + 1
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      THEN
         a
      ELSE
         a + 1 + b
   END AS a
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
      THEN
         b
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      THEN
         b + 1
      ELSE
         1
   END AS b
,  SUBSTRING(@Text, a, b) AS Word
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      OR (
            ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
         AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
         )
      THEN
         0
      ELSE
         1
   END AS IsWord
FROM
   Chars
WHERE
   a < LEN(@Text)

)

SELECT TOP 10
   Word
,  COUNT(*) AS NumberOfTimesUsed
FROM
   Chars
WHERE
   IsWord = 1
GROUP BY
   Word
ORDER BY
   2 DESC
OPTION
   (MAXRECURSION 0)
…and a screen shot of the ten most frequently used words. Apparently I like short words…
Top 10 Most Frequent Words

Recently, a couple of people I’m working with came across a strange issue using the new version of the Dimension Merge SCD component. During the first load all rows would be output as new records (as you’d expect), but when running the same load again all rows would be output as both unchanged and new (not as you’d expect!). The behavior seemed very strange, only a couple of packages exhibited the issue.

Initially I had little joy trying to replicate the issues with Adventure Works, but found the source of the problem after chatting to Bhavik Merchant about it. The problem arose due to two main reasons…

  1. Effective and expiry dates were being managed externally of the Dimension Merge SCD component.
  2. The default for the last records expiry date in the Dimension Merge SCD is the SQL Server datetime maximum value, while the warehouse was using 2199-12-31.

So, in the initial load the records where being inserted with an expiry date of 2199-12-31 which from the Dimension Merge SCD components point of view means the records aren’t current until the end of time. For the second load the Dimension Merge SCD would assign an expiry date of 9999-12-31 to the same source records, making them more recent than those inserted in the first load. So the second load would result in all the records being output again with the different active to date. Since the effective and expiry dates were then managed externally this resulted in two identical records being inserted into the destination table.

The “fix” was simple. Either change the default for the last records expiry date in the data warehouse to 9999-12-31, or set the last records expiry date in the Dimension Merge SCD to the same value being used in the data warehouse. The screenshot below shows where to change this value in the component.

Dimension Merge SCD Effective/Expiry Settings

I can across an annoying error when using another client computer to connect to an Analysis Services cube via Management Studio. The error or received was – “The query could not be processed: XML for Analysis parser: The LocaleIdentifier property is not overwritable and cannot be assigned a new value.”

SSAS LocaleIdentifier Issue

To resolve this you can simply select your language from the language drop down in the cube browse window.

SSAS LocaleIdentifier Issue Resolution