Online Business Intelligence Spring 2014: 2014

Sunday, March 9, 2014

Week 7 Reflections

Gephi, my Gephi

This week we learned about a network visualization tool called Gephi (http://www.gephi.org). This is a cool open source tool that provides a set of powerful analysis tools for networks. Of course, since it is open-source, that comes with its own issues. The current version is a beta release, and there are still some glitches, like screen refreshes that don't happen, or randomly changing colors when you re-run a graph (ok, this only bugs me because I am trying to take screenshots). But it's free, so who am I to complain? Overall, it's pretty good software, although the help/tutorials need some serious help.

Next we looked at netvizz (https://tools.digitalmethods.net/netvizz/facebook), which is a Facebook application that analyzes one's Facebook friend network. Upon completing the analysis, the tool produces a GDF file which can be imported into Gephi! We looked at the Personal Network analysis, although netvizz allow you to analyze groups and pages as well. The tool produces the following data:

file fields (network file - gdf format - nodes are users):
sex: user specified sex
locale: user selected interface language
agerank: accounts ranked by creation date where 1 is youngest
like_count: number of user likes
post_count: number of user posts
post_like_count: number of likes on user's posts
post_comment_count: number of comments on user's posts
post_engagement_count: post_comment_count + post_like_count

When the network was first imported, it did remind me a lot of the Borg cube:

I guess I could go on with Facebook/Borg comparisons, but probably shouldn't...

What's interesting is that during the analysis, I found several discrete clusters:

It's interesting to actually see this information... you wouldn't really understand this without visualization.

It was also interesting to zoom in and see who the influential people in my network are. I guess if I'm planning on spreading some rumors, I know who to tell first!

I think this could also be useful when deciding who to "defriend" on Facebook. If the goal is to keep important people in one's network, then knowing who those people are (via analysis tools like Gephi) can be very helpful indeed!

Sunday, March 2, 2014

Week 6 Reflections

LECTURE 11 -- Introduction to Networking

This week's first lecture was an introduction to the basics of networking. First, we learned that a network is a complex system of people or things that interact with each other.

Next, we learned some of the terminology, such as
- Vertices -- the nodes or entities in the network
- Edges -- the relationships between the nodes.

The basic idea behind analyzing complex systems by examining the networking is that this allows us to understand these systems better, and thus puts us in a better position to control these systems.

We then learned about the nature of relationships in a network. The edges can be directed or undirected. Directed means that the edge between the vertices has a directional attribute (i.e. from one node to the next). What exactly this means will vary from one network graph to another. Undirected means that the edge simply connects the two vertices, but does not convey a direction to the relationship (described in Hernandez-Lopez, 2010).

Edges in networks can also have weights. The weight is a representation of the strength of the relationship.

A couple questions I have at this point are:

Can we represent the strength of a relationship by the length of the edge or the line width?

One thing that came to mind at this point is how one can visually represent nodes and edges. A weak relationship could be represented by a thin line or a long line between the nodes (or both?)

For directional edges - if bidirectional would we put arrowheads on both ends?

Maybe these questions will be addressed when we get into Network Visualization in Lecture 12.

The lecture then discussed single-mode versus two-mode networks. Single-mode networks are those in which all of the entities (nodes) are the same type, whereas two-mode networks are those in which there are two kinds of entities. For both of these types of networks, edges can be directed or undirected and weighted or unweighted. In discussing two-mode networks, Opashl (2013) says that such networks are rarely analyzed without transforming them to single-mode networks, as most tools for analyzing networks have been designed for the latter type.

Next, we looked at how we can represent networks. The lecture compared and contrasted two formats (edge list and adjacency matrix) for both directed and undirected graphs.

One question I had here was how weights would be represented in either of these. I guess that in an edge list it would simply be a matter of adding a "weights" column to the table, and for an adjacency matrix, rather than just 1's and 0's representing an edge or no edge, the weight values could be represented in the cells of the table itself.

Ultimately, the representation of data in tables or matrices makes computational sense, but for a network of any degree of complexity it doesn't help a human understand it. This is where visualization (and the next lecture) come in.

LECTURE 12 -- Network Visualization

In this module, we started off looking at what exactly visualization is. By definition, it is the transformation of data into a representation that can be imagined or seen. Of course, it's way more complex than just pretty graphs. In the lecture, a high-level view of some of the different visualization techniques was discussed.

My master's degree is in Cognitive Psychology, so this area is inherently interesting to me. Cognitively, our brains are wired to understand visual patterns, not lists of data in tables, so visualization is critical to understanding complex data.

The AT&T Labs Research (2009) article discusses some of the considerations when choosing the visualization for a given set of data. One interesting quote came at the end:

"The distinction between visualization and interface is blurring, and the line between visualization and analysis is being crossed" (AT&T, 2009)

Basically, the idea is that visualizations are no longer just static pictures, but instead can be dynamic interfaces that change as different operations are performed on the data. That ability makes tools and tool choice very important.

The last part of the lecture had us perform a network visualization on our LinkedIn network using LinkedIn Maps. I've always found it interesting when I see my connections in LinkedIn who have common connections (e.g. the "people you may know" function illustrates this very easily when you see totally unrelated connections are connected to some new, random person). The InMaps feature was especially interesting. I used it on my connections and readily saw clusters of connections in my network:

What is interesting is how InMaps clustered the orange and green blobs close together, because historically the individuals in those groups came from a common company.

LECTURE 13: Structural Properties of Networks

The final lecture of the week focussed on properties of networks, namely properties of entities and how they relate to one another.

Measures of centrality discussed were:

Degree Centrality -- the number of connections (in/out for directed graphs) that a node has
Betweenness Centrality -- the number of shortest paths that go through a node
Closeness Centrality -- average of lengths of the shortest paths from a node to all other nodes
Eigenvector Centrality -- a measure for a node that incorporates the eigenvector centrality measures for connected nodes

Obviously, for complex networks, these computations are best done by software packages!

Next, the lecture discussed how the various centrality measures can be interpreted. An example was given for social networks:

Degree: how many people can this person reach directly?
Betweenness: how likely is this person to be the most direct route between two people in the network?
Closeness: how fast can this person reach everyone in the network?
Eigenvector: how well is this person connected to other well-connected people?

The discussion of closeness centrality made me think of "Six Degrees of Kevin Bacon" (http://www.oracleofbacon.org). The two-mode directed graph of nodes (actors and movies) lets one visualize the relationships between actors. For example, here is a small sample:

The Oracle of Bacon site is cool because it will calculate the "Bacon Number" between Kevin Bacon and another actor, which is a closeness measure. I don't have an IMDB entry, but a friend of mine does, so I calculated his Bacon Number:

So I guess that would give me a "Bacon Number" of 3 ;^)

The lecture then examined measures that were more network-centric, such as:

Reciprocity -- the degree of mutuality in a directed network
Density -- the ratio of the number of edges to the number of possible edges
Clustering Coefficient -- a measure of the density of a network or a community within a network
Distances -- the average distance of a network is the average of the shortest paths for all the nodes. The diameter of the network is the largest of the shortest paths.

Finally, the lecture examined components and the connectedness of components. Connectedness can be used to identify cliques within networks.

References

"AT&T Researchers” Inventing the Science Behind the Service." AT&T Labs Research. N.p., 11 Oct. 2009. Accessed Web. 2 Mar. 2014. <http://www.research.att.com/articles/featured_stories/2009/200910_more_than_just_a_picture.html?fbid=v4xybf4OUtS>.

Hernandez-Lopez, Rogelio. "Complex networks and collective behavior in nature." Complex networks and collective behavior in nature. 2010. Accessed Web. 2 Mar. 2014. <http://web.mit.edu/8.334/www/grades/projects/projects10/Hernandez-Lopez-Rogelio/structure_1.html>.

Opsahl, T., 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks 35, doi:10.1016/j.socnet.2011.07.001 <http://toreopsahl.com/tnet/two-mode-networks/defining-two-mode-networks>

Sunday, February 23, 2014

Week 5 Reflections

This week's lecture dealt with Web Analytics. Web Analytics involves collecting data about visitors who come to your website and what they do on your website, and then analyzing that data in meaningful ways.

Specifically, we looked at Google Analytics (GA). GA is popular because it is easy to use: simply insert a small amount of JavaScript code into the page(s) you want to track; it is also FREE, which is probably the main reason why it's one of the leading web analytics tools.

The diagram below (from http://www.ohcpi.com/analytics.html) illustrates how GA works.

Users connect to your website and download site content
When their browser renders the HTML, JavaScript embedded in the page makes a call to the Google servers

Since this call originates from the visitor's computer and not your server, it contains information (e.g. cookies) that are specific to Google's domain, as well as information about the page (URL) in which it was embedded

Google assimilates the data it collects from these requests
Google provides reports and tools that can be used to slice and dice the collected data

The lecture discussed the web analytics cycle, emphasizing that analytics is not a one-shot view of things but instead a continuous cycle:

set goals -- decide what data you are going to be collecting, what you hope the results of analysis will be
measure -- collect the data
report -- organize the data into a format that can be analyzed
analyze -- examine the data to determine how it measures up to your goals
optimize -- make changes as necessary to deal with shortcomings or issues seen
repeat!

The lecture then discussed the Five Ws of web analytics: what who where when why

What - actions that are being performed on your website -- what links are they clicking on, etc.
Who - the audience, demographics of the audience
When - the time, days/hours, time of year, how long do they stay on a particular page
Where - geographical areas of visitors
Why - are they buying products? reading blogs? contributing reviews? all of the above?

I think that these are useful guidelines and I found them helpful when organizing the GA report I did for my client.

The lecture then discussed some of the measures available and (in the second module) went through some examples of information one can get via GA.

What About Ethics?

I think that one thing that was missing from this week's lecture was a discussion around the ethics of web analytics. At what point are we crossing the line when we analyze information about the visitors to our websites?

In one respect, one could say that when users visit your website, you have the right to track what they do and where they go. All web servers log requests, so the fact that someone is accessing the website and what they download (everything including HTML files, JavaScript, images, and documents) is recorded in the web server log.

However, GA (and other analytics tools) goes one step further -- they have you add JavaScript code to your page(s) and then the visitor's browser makes a call to the Google servers. For security reasons, web browsers are required to only pass cookies to the site from which they originate, but this JavaScript trick is a way to sneak around that "limitation" -- the call goes from the user's web browser to Google's server, so while Google doesn't get any cookies that your site may have set (those are private), it does get cookies that Google has set in some other connection. The returning data also sets cookies in the web browser that belong to Google. So that means that cookies that are set because you did Google searches, or logged into your Gmail account, or went to another site that uses GA are all sent along to Google. Of course, these cookies are not "Google Search Was Here" human-readable pieces of information, but instead are hexadecimal strings that only have meaning to entries in the Google data server.

For example, using a web developer plugin, one can easily see the cookies set for the google.com domain. All of these cookies would be sent to GA when you navigate to a page with the GA JavaScript entry.

What this means is that Google can track a person's behavior across many sites on the Internet. What do they do with this data? We can glean a little information by looking at Google's privacy policy (http://www.google.com/intl/en/policies/privacy), but in reality what they are saying is "trust us".

Ultimately, if one is collecting PII (personally identifiable information) then a line has probably been crossed (at least in some countries).

The following cartoon (from Measuring Success, 2013) compares internet traffic to traffic on a road and the equivalents of collecting information.

This highlights a big difference between doing business online in the European Union (EU) versus the US: the EU has strong privacy laws that require that sites get consent from users before collecting much more than the basic, non-individual data. The data that can be collected without consent is basically the same data you can get by analyzing the web server logs.

So what can we do? What should we do? Well, as individuals, you can choose to mess up GA (or other analytics) by deleting cookies...

You don't have to delete all your cookies, you can choose to selectively delete cookies from particular domains (but this can end up being a lot of work). Personally, I delete cookies pretty regularly and it's just become a habit of mine. The downside to deleting cookies is that you can lose things like "remember my login" on pages you frequent. Bank of America is always asking for my state and for me to confirm my computer!

As designers/architects of web sites, or helping guide those who are, we have to decide if we want to be part of the big information gathering machine that is Google (or, again, any other analytics system, not meaning to pick on Google).

References

"Online Privacy – The Good, the Bad, the Ugly." Measuring Success RSS. N.p., 6 Mar. 2013. Web. 23 Feb. 2014. <http://www.advanced-web-metrics.com/blog/2013/03/06/online-privacy-the-good-the-bad-the-ugly/>

Saturday, February 15, 2014

MicroStrategy "Tutorials"

O M G

Those MicroStrategy tutorials were THE WORST TUTORIALS EVER. Seriously. Is one supposed to learn something from the ADHD "click here, now click here, now click here" approach? And the so-called "test" for each "lesson" (I'm using quotes very deliberately here!) is to somehow supposed to test your knowledge? About 80% of the time, I click where I'm supposed to (I think? it's the only place that responds to a click) and I get a server error:

UPDATE: so, I tried this out using a different browser and there are pop-ups that guide you through the lesson. OK, so it's slightly better than just seemingly randomly clicking where the big blue arrow points. It would have been *nice* for them to mention that pop-ups were required, or do browser detection to let me know that I wasn't using a supported browser! At least now there's some context to where I'm clicking. And the "tests" direct me where to click (rather than me just clicking around on the screen). I'm still getting the 500 server errors, though. I still think that these are the worst tutorials ever, they just aren't as completely useless as I originally found them.

UPDATE 2: Yay! Done with the "tutorials". I don't think I really learned anything, even though I scored 100% on most of the "tests". Hopefully I'll learn more from the actual assignment.

UPDATE 3: Actually working with MicroStrategy has been much more valuable... It's a very powerful tool and maybe if I had a month or two to use it then I'd actually be able to learn how to use it effectively. The "tutorials", other than letting me know about different buttons, etc. didn't really teach me anything. It's like if someone was teaching you how to drive a car and started out going over all the different parts (here is the steering wheel, here is the brake pedal, etc.). and then you're told "OK, now drive to the store". Frustrating, to say the least.

Sunday, February 9, 2014

Week 3 Reflections

The first lecture this week expanded on the theme established in week 2: namely the design of star and snowflake schemas using fact and dimensional tables. For fact tables, we learned about the different types of facts (additive, semi-additive, and non-additive), and what was really interesting was the concept of factless fact tables. Factless fact tables can be used to record events or conditions in the data, as described by Datawarehouse Concepts (2012). The basic idea is that a factless fact table ties various dimension tables together to record some event or condition. So, a factless fact table just contains foreign keys of various dimension tables. Very interesting stuff!

With respect to dimensions, we learned about different types of dimensions used in data warehouses:

degenerate dimensions
role-playing dimensions
junk dimensions
slow-changing dimensions

Degenerate dimensions are interesting, because they are dimensions that occur in fact tables! While this may seem counter-intuitive, it makes sense when you think about it. These types of dimensions are used to provide information about a particular transaction.

Degenerate dimensions commonly occur when the fact table’s grain is a single transaction (or transaction line). Transaction control header numbers assigned by the operational business process are typically degenerate dimensions, such as order, ticket, credit card transaction, or check numbers. These degenerate dimensions are natural keys of the “parents” of the line items. (Becker, 2003)

For me, one of the really interesting ideas was that of "slow-changing dimensions". I suppose this is because I currently work for a company that produces versioning software, so the idea of tracking how dimensions change over time is inherently interesting. I guess the real challenge here is identifying what dimensions are important to version, how many versions are important to keep around, etc. Choosing incorrectly, especially failing to version a dimension, might prove problematic in the future when someone might decide that they do want a historical view of that dimension. Of course, versioning everything would be ideal, but is it realistic? Especially when considering the storage requirements of versioned dimensions. Margy Ross (2013) outlines various techniques for dealing with different types of slowly-changing dimensions. In a solution that needs to deal with slowly changing dimensions, it's likely that a combination of these techniques would be used.

In the first lecture, we also learned about surrogate keys, and how they can be useful because, unlike primary keys, they do not have "embedded intelligence". An example of embedding intelligence in a primary key can be seen in the Customer table below. Here, while the CustomerID values are unique (and hence primary keys), they have "G" and "S" embedded in them, which indicates the type of customer: "Gold" or "Silver".

Customer ID (PK)	First Name	Last Name
G100	Mary	Jones
S100	Bob	Smith
G101	Yvette	Lancaster
S101	Sonja	Spenser
G102	Matt	Dawson
S102	Larry	Melrose

Instead, by adding another column with a surrogate key, we decouple the embedded meaning from the primary key used in operations. For example, introducing a surrogate key into the above table might result in something that looks like this:

CID (PK)	Customer ID (PK)	First Name	Last Name
0001	G100	Mary	Jones
0002	S100	Bob	Smith
0003	G101	Yvette	Lancaster
0004	S101	Sonja	Spenser
0005	G102	Matt	Dawson
0006	S102	Larry	Melrose

Other benefits of using surrogate keys outlined in the lecture are that they increase operational efficiency and reduce the impact of changes to the 'real' primary key (if Matt Dawson in the example above is demoted to Silver).

The second lecture focussed on data quality, specifically on the use of data profiling to help determine data quality. Outlined was the basic process of data profiling, along with some basic steps involved in the data profiling process:

Creating The Profiling Plan -- planning on how the data will be analyzed -- understanding the nature of the data (tables, columns) and determining how to examine for primary keys, foreign keys, and business rule violations.
Interpreting the Profiling Results -- determining whether the data is high or low in quality and what needs to be done with the data (i.e. cleaning).
Cleansing the Data -- preparing the data for ETL.

Here, I think that it is easy to fall into the trap where you become too reliant on particular tools that are being used. Instead, it is good to keep the 'big picture' in mind and that a combination of tools and techniques will likely be useful for profiling the data. The process may also be iterative, in that the profiling results may lead one to re-run the data profiling software (or use different software) on the datasets to provide another perspective and look at the data from a different vantage point.

The lecture then went on to discuss some examples of data profiling, demonstrating how one would go through the process of identifying primary keys, strange data that needs to be cleaned, referential integrity checks and business rule checks. For performing the initial analysis, tools and automation will likely be needed, as the process can be quite time consuming and error prone. In the lecture, the Gartner Magic Quadrant analysis of various data profiling tools was presented. Each tool has weaknesses and strengths, so some investigation would need to be done in order to determine which tool(s) would be best for a given analysis.

One of the main takeaways from this week's materials is that the quality of the data is paramount. You need to be able to distinguish good data from bad data (and it's not always as easy as just looking for the one in the turtleneck)...

The primary reason that 40% of business initiatives fail is due to poor quality data. Data inconsistencies, lack of completeness, duplicate records, and incorrect business rules often result in inefficiencies, excessive costs, compliance risks and customer satisfaction issues. Therefore improving the quality of your enterprise data will have a huge impact on your business. (IBM Whitepaper, 2012)

The lecture concluded with a discussion of Master Data Management (MDM), and how data quality analysis is a part of this larger process. MDM is defined as "the processes, governance, policies, standards and tools that consistently defines and manages the critical data of an organization to provide a single point of reference" (see http://en.wikipedia.org/wiki/Master_data_management). How an organization approaches MDM can vary greatly, as the size of the organization presents different challenge characteristics (as described by Graham, 2010):

Organization Size	Central Challenge
Small	Small amounts of master data. Data integration is not a top priority.
Mid-size	Data integration starts to become difficult for an organization. Data stewards can be clearly defined.
Large	Huge amounts of master data and system integration. Mostly homogeneous data silos with relatively consistent attributes. Data stewards may now have a full time role.
Conglomerate	Many disparate businesses that may create many groups of data (i.e., multiple product lines, general ledgers, and so on).

So one question that arises is how MDM fits into the "big picture" of information management within an organization. In his blog article, Weigel (2013) emphasizes Information Governance as a "discipline that oversees the management of your enterprise’s information". He goes on to describe how business goals and information management initiatives can be aligned under Information Governance:

Master Data Management is a key initiative for the success of overall information governance. Ultimately, the business needs to be able to rely on the data in order to make strategic decisions. If the data cannot be trusted, can the decisions based on that data be?

References

Becker, Bob. "Fact Table Core Concepts." Kimball Group. N.p., 3 June 2003. Accessed Web. 9 Feb. 2014. <http://www.kimballgroup.com/2003/06/03/design-tip-46-another-look-at-degenerate-dimensions/>.

"DATAWAREHOUSE CONCEPTS." What is a FACTLESS FACT TABLE?Where we use Factless Fact. N.p., 4 Aug. 2012. Accessed Web. 9 Feb. 2014. <http://dwhlaureate.blogspot.com/2012/08/factless-fact-table.html>.

"Garbage in, quality out. Now that's different.." IBM. IBM, Oct. 2012. Accessed Web. 9 Feb. 2014. <http://www-01.ibm.com/software/info/rte/bdig/ii-5-post.html>.

Graham, Tyler. "Organizational Approaches to Master Data Management." Organizational Approaches to Master Data Management. Microsoft, 1 Apr. 2010. Accessed Web. 9 Feb. 2014. <http://msdn.microsoft.com/en-us/library/ff626496.aspx>.

Harris, Jim. "OCDQ Blog." Adventures in Data Profiling. N.p., 3 Aug. 2009. Accessed Web. 9 Feb. 2014. <http://www.ocdqblog.com/adventures-in-data-profiling/>.

Ross, Margy. "Dimension Table Core Concepts." Kimball Group. N.p., 5 Feb. 2013. Accessed Web. 9 Feb. 2014. <http://www.kimballgroup.com/2013/02/05/design-tip-152-slowly-changing-dimension-types-0-4-5-6-7/>.

Weigel, Niels. "Data Profiling and Data Cleansing - Use Cases and Solutions at SAP." Recent Posts. N.p., 12 June 2013. Accessed Web. 9 Feb. 2014. <http://scn.sap.com/community/enterprise-information-management/blog/2013/06/12/data-profiling-and-data-cleansing--use-cases-and-solutions-at-sap>.

Sunday, February 2, 2014

Week 2 Reflections

Dimensional Modeling

This week's textbook reading on Dimensional Modeling was quite eye opening. At first, my reaction to non-normalized database tables (with respect to the dimension tables) was "noooooooo!", not that I'm a huge fan of crazy ER diagrams...

"my head hurts"

However, the star schema approach described makes a lot of sense. When one focusses on the ease of data access and the ability to accommodate changing requirements, the simplicity of the Fact and Dimension tables and their relationships is compelling.

The pitfalls outlined at the end of the chapter also seem particularly vital. As a technologist, I find it easy to become excited by the technologies involved and focus on the back-end processing. It's good to be reminded that technologies are just tools and that ease-of-use and front-end processing are more important. After all, you can build the most awesome data warehouse, but if nobody uses it because it is too cumbersome or slow, then what's the point?

The Four Step Process

The four-step process outlined in the lecture materials really helped solidify the concepts of dimensional modeling. Steps 1 and 2 seem especially important, as they guide the rest of the process.

One surprising thing in the lecture was the idea of a "Date/Time Dimension" being present in almost every data warehouse and how it is one of the most important dimensions in a data mart. I understand why a date/time is important, but I would have thought that having a table to contain the values described would be unnecessary. To me, the data in this dimension can be easily and quickly calculated and to actually store this in a table would have been a waste of space. It is easy to convert from UNIX epoch time to things like the week number, like this in Perl:

my $weekNum = POSIX::strftime("%V", someTimeValue);

Of course, I will admit that 'holiday indicator' would be a little more difficult to determine programatically. After the discussion on slicing and dicing, it does make a little more sense to have this information ready and available for use in queries (although I'm not totally sold yet on the idea!)

OLAP Operations

I enjoyed the section on data cube operations (slice, dice, roll-up, drill-down, and pivot) and found that the examples really helped me understand the concepts.

Overall Impressions

The approach to dimensional modeling reminds me a lot of the principles outlined in agile software development (e.g. see Armson, 2012), specifically the ability for the system to accommodate change. Approaching dimensional modeling as an iterative process also harkens to the agile software methodology.

I think that the publishing metaphor at the beginning of the textbook was especially apt, where the responsibilities were framed for the role of data warehouse manager ((chapter 1, page 6). The responsibilities outlined really help drive home the point that data warehouses are there to support the needs of the business and thus need to be business-user focussed.

References

Armson, Kathryn “The Agile Method Explained: Beginners Guide & Summary of Benefits”. Linchpin.com. July 5, 2012. http://www.linchpinseo.com/the-agile-method

Friday, January 24, 2014

SELF INTRODUCTION

Hello fellow BI-ers!

My name is Alan and I started the MIS program last year. So far it's been interesting and I've had the chance to meet a lot of great people.

A little about me... I moved to the US from Canada back in 1991 to go to grad school (in Los Angeles) and then 10 years ago I moved to Tucson so I haven't had to deal with winter for almost 23 years! Nothing is more satisfying than looking at the forecast for Edmonton and seeing -30 degrees!

I got my degree in Cognitive Psychology (neural networks) back in 1995 and then jumped on the Internet bandwagon by developing web sites and systems for customers in many different areas. I've been self employed as well as worked for the "man" - Sun, Oracle. Pretty much my entire career has been involved in generating and analyzing data of various sorts. These days I'm working as a consultant for Perforce where our customers manage historical data with our product.

I'm looking forward to this class. I enjoyed the Data Warehousing class last summer and it really piqued my interest in the Big Data. I also attended the symposium that Eller hosted last year, and it was interesting to learn how different companies and industries are using big data.

On a personal note, I just adopted a kitten (well, he's 10 months old!) who doesn't like it when I'm working rather than playing. He isn't shy about letting me know (usually by hitting me in the head and then running away).