Sunday, February 2, 2014

Week 2 Reflections



Dimensional Modeling

This week's textbook reading on Dimensional Modeling was quite eye opening. At first, my reaction to non-normalized database tables (with respect to the dimension tables) was "noooooooo!", not that I'm a huge fan of crazy ER diagrams...
"my head hurts"

However, the star schema approach described makes a lot of sense. When one focusses on the ease of data access and the ability to accommodate changing requirements, the simplicity of the Fact and Dimension tables and their relationships is compelling. 

The pitfalls outlined at the end of the chapter also seem particularly vital. As a technologist, I find it easy to become excited by the technologies involved and focus on the back-end processing. It's good to be reminded that technologies are just tools and that ease-of-use and front-end processing are more important. After all, you can build the most awesome data warehouse, but if nobody uses it because it is too cumbersome or slow, then what's the point?

The Four Step Process

The four-step process outlined in the lecture materials really helped solidify the concepts of dimensional modeling. Steps 1 and 2 seem especially important, as they guide the rest of the process.

One surprising thing in the lecture was the idea of a "Date/Time Dimension" being present in almost every data warehouse and how it is one of the most important dimensions in a data mart. I understand why a date/time is important, but I would have thought that having a table to contain the values described would be unnecessary. To me, the data in this dimension can be easily and quickly calculated and to actually store this in a table would have been a waste of space. It is easy to convert from UNIX epoch time to things like the week number, like this in Perl:

my $weekNum = POSIX::strftime("%V", someTimeValue);

Of course, I will admit that 'holiday indicator' would be a little more difficult to determine programatically. After the discussion on slicing and dicing, it does make a little more sense to have this information ready and available for use in queries (although I'm not totally sold yet on the idea!)

OLAP Operations

I enjoyed the section on data cube operations (slice, dice, roll-up, drill-down, and pivot) and found that the examples really helped me understand the concepts.

Overall Impressions

The approach to dimensional modeling reminds me a lot of the principles outlined in agile software development (e.g. see Armson, 2012), specifically the ability for the system to accommodate change. Approaching dimensional modeling as an iterative process also harkens to the agile software methodology.

I think that the publishing metaphor at the beginning of the textbook was especially apt, where the responsibilities were framed for the role of data warehouse manager ((chapter 1, page 6). The responsibilities outlined really help drive home the point that data warehouses are there to support the needs of the business and thus need to be business-user focussed.

References

Armson, Kathryn “The Agile Method Explained: Beginners Guide & Summary of Benefits”. Linchpin.com. July 5, 2012. http://www.linchpinseo.com/the-agile-method

No comments:

Post a Comment