A blog about designing and developing Business Intelligence solutions using Microsoft SQL Server
Header

In some cases we need to bring comments or descriptions into a dimension in SSAS for detail reporting. For example the comment from a maintenance worker against a machine breakdown, or a geological survey description. These fields are typically free text in the source system and as such don’t contain restrictions or constraints on data entry (e.g. upper and lower case, empty values, white-space, uniqueness). When you try and process these attributes you’ll run into a couple of issues.

To demonstrate this I’ve knocked up a few rows with variations in case of the same text, a null value and an empty string.

Free Text Sample Data

Creating a dimension based on this table in SSAS gives the following error when we try to process it – “Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘dbo_FreeText’, Column: ‘Comment’, Value: ”. The attribute is ‘Comment’”.

SSAS Duplicate Key - Empty and NULL Values

This is due to the NullProcessing property on the dimension attribute, for which the default setting is Automatic. This specifies that the Null value is converted to zero (for numeric data items) or in this case a blank string (for string data items). This means that we’ve now got a null value which is converted to an empty string and an empty string, resulting in duplicate key values. If we set NullProcessing to Preserve this will fix this, giving us two distinct key values. If your database or this particular field is case sensitive you’ll then likely run into a second issue with a very similar error message – “Errors in the OLAP storage engine: A duplicate attribute key has been found when processing: Table: ‘dbo_FreeText’, Column: ‘Comment’, Value: ‘Some comment’. The attribute is ‘Comment’”. This comes down to a difference in case sensitivity between the underlying database and Analysis Services, which by default is case insensitive. In SSAS this is set at the attribute level. If you select the Comment attribute and expand they KeyColumns group you’ll find the collation property, under which you’ll find a case sensitive check box.

SSAS Attribute Collation

Once you’ve checked this the dimension should now process successfully. When we browse the comment attribute you’ll notice two “blank” values (null and empty) and multiple values for “some comment”. This example assumes that these are different discrete values. If this isn’t the case this should be handled in ETL i.e. converting null values to empty strings and standardising on case.

SSAS Browser Comment Attribute

Transact-SQL IN

August 20th, 2012 | Posted by David Stewart in T-SQL - (0 Comments)

The T-SQL IN operator can be used to test whether a given value matches any item in a particular list or sub-query. If the given value equals an item in the list or sub-query the expression returns true, otherwise it returns false.

The following example (based on AdventureWorks) uses IN to return all the departments which are in the research and development, or quality assurance groups. Below is the full list of departments:

DepartmentID Name                                               GroupName                                          ModifiedDate
------------ -------------------------------------------------- -------------------------------------------------- -----------------------
1            Engineering                                        Research and Development                           1998-06-01 00:00:00.000
2            Tool Design                                        Research and Development                           1998-06-01 00:00:00.000
3            Sales                                              Sales and Marketing                                1998-06-01 00:00:00.000
4            Marketing                                          Sales and Marketing                                1998-06-01 00:00:00.000
5            Purchasing                                         Inventory Management                               1998-06-01 00:00:00.000
6            Research and Development                           Research and Development                           1998-06-01 00:00:00.000
7            Production                                         Manufacturing                                      1998-06-01 00:00:00.000
8            Production Control                                 Manufacturing                                      1998-06-01 00:00:00.000
9            Human Resources                                    Executive General and Administration               1998-06-01 00:00:00.000
10           Finance                                            Executive General and Administration               1998-06-01 00:00:00.000
11           Information Services                               Executive General and Administration               1998-06-01 00:00:00.000
12           Document Control                                   Quality Assurance                                  1998-06-01 00:00:00.000
13           Quality Assurance                                  Quality Assurance                                  1998-06-01 00:00:00.000
14           Facilities and Maintenance                         Executive General and Administration               1998-06-01 00:00:00.000
15           Shipping and Receiving                             Inventory Management                               1998-06-01 00:00:00.000
16           Executive                                          Executive General and Administration               1998-06-01 00:00:00.000

(16 row(s) affected)

Now to get only the departments that are in the groups research and development or quality assurance. We need to use the IN operator in the WHERE clause to restrict the rows returned:

1
2
3
4
5
6
7
8
9
SELECT
   *
FROM
   HumanResources.Department
WHERE
   GroupName IN (
      'Research and Development'
   ,  'Quality Assurance'
   )

This gives us the following result:

DepartmentID Name                                               GroupName                                          ModifiedDate
------------ -------------------------------------------------- -------------------------------------------------- -----------------------
1            Engineering                                        Research and Development                           1998-06-01 00:00:00.000
2            Tool Design                                        Research and Development                           1998-06-01 00:00:00.000
6            Research and Development                           Research and Development                           1998-06-01 00:00:00.000
12           Document Control                                   Quality Assurance                                  1998-06-01 00:00:00.000
13           Quality Assurance                                  Quality Assurance                                  1998-06-01 00:00:00.000

(5 row(s) affected)

We can also include NOT with the IN operator, which logically returns false when the value matches an item in the list and true otherwise. So we could get all departments that aren’t in the above groups:

1
2
3
4
5
6
7
8
9
SELECT
   *
FROM
   HumanResources.Department
WHERE
   GroupName NOT IN (
      'Research and Development'
   ,  'Quality Assurance'
   )
DepartmentID Name                                               GroupName                                          ModifiedDate
------------ -------------------------------------------------- -------------------------------------------------- -----------------------
3            Sales                                              Sales and Marketing                                1998-06-01 00:00:00.000
4            Marketing                                          Sales and Marketing                                1998-06-01 00:00:00.000
5            Purchasing                                         Inventory Management                               1998-06-01 00:00:00.000
7            Production                                         Manufacturing                                      1998-06-01 00:00:00.000
8            Production Control                                 Manufacturing                                      1998-06-01 00:00:00.000
9            Human Resources                                    Executive General and Administration               1998-06-01 00:00:00.000
10           Finance                                            Executive General and Administration               1998-06-01 00:00:00.000
11           Information Services                               Executive General and Administration               1998-06-01 00:00:00.000
14           Facilities and Maintenance                         Executive General and Administration               1998-06-01 00:00:00.000
15           Shipping and Receiving                             Inventory Management                               1998-06-01 00:00:00.000
16           Executive                                          Executive General and Administration               1998-06-01 00:00:00.000

(11 row(s) affected)

We can also use sub-queries with the IN operator. For example we might want to get a list of employee id’s that have been assigned to a department within the Quality Assurance group:

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
   EmployeeID
FROM
   HumanResources.EmployeeDepartmentHistory
WHERE
   DepartmentID IN (
      SELECT
         DepartmentID
      FROM
         HumanResources.Department
      WHERE
         GroupName = 'Quality Assurance'
   )

Previously I wrote a post about the LocaleIdentifier error when browsing an Analysis Services cube via Management Studio. Whilst it’s annoying, it really only affects developers so quickly slips down the list of priorities to address! However, when it starts to affect users it becomes a serious issue. Working on another project with an Excel component this issue came up in a couple of places – connecting to the cube from Excel (sometimes), and using drill-through in Excel.

After looking into this for a while I came across a solution, although it’s not one I feel completely comfortable with.

1. Go to Windows > Control Panel > Region and Language

2. Update the Format to “English (United States)”

Region and Language Format

3. Click Apply

4. Update the Format back to your original language, in my case “English (Australia)”

5. Click Apply

6. Click OK

Every now and then I get the requirement to add row numbers to a table in SQL Server Reporting Services. Usually this is simple enough using the row number function - http://msdn.microsoft.com/en-us/library/ms159225(v=SQL.100).aspx. However, typically becomes a lot less straight forward when requirements are changed slightly!

In this case I was developing a monthly report which returned summarised information by account code for input into a finance system and included several detail sections for auditing and reconciliation. The summary table required a sequence number, which reflected the row number of the table. I had no joy scoping the row number function to the table or group, so started thinking about alternative solutions. The report contained multiple sections using the same source query, so returning multiple queries was inefficient and created duplicate code. Given the tablix is grouped by account code, it’s effectively returning a distinct list of account codes. Therefore, the row/sequence number is actually the row number of the account code in a distinct list of the account codes. Once you start looking to utilise this, it’s actually fairly simple to get the desired row numbers.

For this example I created a simple table of sample data containing two columns – Account Code and Amount.

I then created a simple tablix grouped by account code – displaying the account code, amount, and a distinct count of the account code. Looking at the output below it’s pretty easy to recognise that the row number is a cumulative sum of the distinct count of account codes.

Conveniently there’s a function in SSRS to do just that! RunningValue - http://msdn.microsoft.com/en-us/library/ms159136.aspx. Running value accepts three parameters…
expression - the expression or field to aggregate,  in this case the account code field.
function - the aggregate function to apply, in this case count distinct.
scope - the name of the dataset, data region, etc… in this case the tablix.
Having not renamed the tablix, I replaced the distinct count field with the following expression…

1
=RunningValue(Fields!AccountCode.Value, CountDistinct, "Tablix1")

…and there we have it, row numbers in a tablix group.

Recently I had a requirement to set the colour of a couple of non-calculated measures in Analysis Services to differentiate between an actual and inferred value. It was another one of those things that I’d remembered being as simple as selecting a value from the properties screen… Unfortunately my memory deceives me and I was greeted with a properties window totally devoid of any colour settings!

Conveniently it’s easy (enough) to set the colours using MDX in the cube calculation script. SSAS defines colour values using a somewhat strange system, there’s a great post from Boyan Penev explaining this – Colour Codes in SSAS.

To set the colour of the Amount measure in the Adventure Works cube to a lovely shade of pink, you’d add the following line to the cube calculation script…

1
FORE_COLOR ( [Measures].[Amount] ) = 16711935;

With the resulting err… lovely pink measure when browsed from Management Studio.

Pink Coloured Non-Calculated Measure

I saw a thread on MSDN Forums recently asking for T-SQL to get a list of the top 10 most frequent words in a given string. While this could be done using cursors, I have a general dislike of them! It got me interested in whether or not this could be achieved using a common table expression…

So I had a think and put together some T-SQL, with some success (it didn’t have a syntax error or run forever). After a few changes I had broken the string up into a table of words, with a bit to indicate whether or not each record is a valid word. In this case, I just defined a word as a string of characters containing only a-z or A-Z, followed by any other character (or no character). This could easily be adjusted to include other characters though, for example ‘ to allow we’re, haven’t, etc.

Essentially the logic behind the CTE is…

  1. You have two integers a and b – one indicating the start of the current word (a), the other the possible length of the word (b).
  2. If the character at a isn’t an alphabetic character, then this isn’t the start of a word. So we need to move a to the next character.
  3. If the character at a + b (one character after the end of the possible current word) isn’t an alphabetic character then we’ve found a word. So we should mark the string starting at a of length b as a word. In this case we set a = a + b + 1, and reset b = 1. If a + b is an alphabetic character, then we haven’t reached the end of the word yet and we need to increment b by 1.

Below is the T-SQL I used, applied over this post.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
DECLARE @Text var char(max) = ''; -- remove space from var char

-- A - Z = 65 = 90
-- a - z = 97 = 122

WITH
Chars AS (

SELECT
   1 AS a_in
,  1 AS b_in
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, 1, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, 1, 1)) NOT BETWEEN 97 AND 122
      THEN
         2
      ELSE
         1
   END AS a
,  1 AS b
,  SUBSTRING(@Text, 1, 0) AS Word
,  0 AS IsWord

UNION ALL

SELECT
   a AS a_in
,  b AS b_in
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
      THEN
         a + 1
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      THEN
         a
      ELSE
         a + 1 + b
   END AS a
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
      AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
      THEN
         b
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      THEN
         b + 1
      ELSE
         1
   END AS b
,  SUBSTRING(@Text, a, b) AS Word
,  CASE
      WHEN
         ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 65 AND 90
      OR ASCII(SUBSTRING(@Text, a + b, 1)) BETWEEN 97 AND 122
      OR (
            ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 65 AND 90
         AND   ASCII(SUBSTRING(@Text, a, 1)) NOT BETWEEN 97 AND 122
         )
      THEN
         0
      ELSE
         1
   END AS IsWord
FROM
   Chars
WHERE
   a < LEN(@Text)

)

SELECT TOP 10
   Word
,  COUNT(*) AS NumberOfTimesUsed
FROM
   Chars
WHERE
   IsWord = 1
GROUP BY
   Word
ORDER BY
   2 DESC
OPTION
   (MAXRECURSION 0)
…and a screen shot of the ten most frequently used words. Apparently I like short words…
Top 10 Most Frequent Words

Recently, a couple of people I’m working with came across a strange issue using the new version of the Dimension Merge SCD component. During the first load all rows would be output as new records (as you’d expect), but when running the same load again all rows would be output as both unchanged and new (not as you’d expect!). The behavior seemed very strange, only a couple of packages exhibited the issue.

Initially I had little joy trying to replicate the issues with Adventure Works, but found the source of the problem after chatting to Bhavik Merchant about it. The problem arose due to two main reasons…

  1. Effective and expiry dates were being managed externally of the Dimension Merge SCD component.
  2. The default for the last records expiry date in the Dimension Merge SCD is the SQL Server datetime maximum value, while the warehouse was using 2199-12-31.

So, in the initial load the records where being inserted with an expiry date of 2199-12-31 which from the Dimension Merge SCD components point of view means the records aren’t current until the end of time. For the second load the Dimension Merge SCD would assign an expiry date of 9999-12-31 to the same source records, making them more recent than those inserted in the first load. So the second load would result in all the records being output again with the different active to date. Since the effective and expiry dates were then managed externally this resulted in two identical records being inserted into the destination table.

The “fix” was simple. Either change the default for the last records expiry date in the data warehouse to 9999-12-31, or set the last records expiry date in the Dimension Merge SCD to the same value being used in the data warehouse. The screenshot below shows where to change this value in the component.

Dimension Merge SCD Effective/Expiry Settings

I can across an annoying error when using another client computer to connect to an Analysis Services cube via Management Studio. The error or received was – “The query could not be processed: XML for Analysis parser: The LocaleIdentifier property is not overwritable and cannot be assigned a new value.”

SSAS LocaleIdentifier Issue

To resolve this you can simply select your language from the language drop down in the cube browse window.

SSAS LocaleIdentifier Issue Resolution

Recently I was working on a little side project, where I needed to allow users to connect to a variety of data sources using several connection types – SQL Server, OLE DB and ODBC. Unless I wanted users to manually type in the OLE DB provider (errr… I had no intention of even doing this myself…) then I needed to populate a list of installed OLE DB providers on the current machine.

Having had very little experience in .NET programming I searched around a bit and found a few less the elegant solutions, but was surprised just how simple it was to do in the end. In fact there’s a class to do it… OleDbEnumerator

There’s a couple of methods of interest in the class – GetElements() which returns a data table containing information about the installed OLE DB providers, and GetRootEnumerator() which returns a data reader with information about the installed OLE DB providers.

In a windows forms application, populating a data grid view is as simple as one line:

1
2
3
' Set the data source of the data grid view to the
' data table returned by GetElements()
DataGridView1.DataSource = New OleDbEnumerator().GetElements()

The resulting data grid view is shown below.

OLE DB Provider List