Online Business Intelligence Spring 2014: Week 6 Reflections

LECTURE 11 -- Introduction to Networking

This week's first lecture was an introduction to the basics of networking. First, we learned that a network is a complex system of people or things that interact with each other.

Next, we learned some of the terminology, such as
- Vertices -- the nodes or entities in the network
- Edges -- the relationships between the nodes.

The basic idea behind analyzing complex systems by examining the networking is that this allows us to understand these systems better, and thus puts us in a better position to control these systems.

We then learned about the nature of relationships in a network. The edges can be directed or undirected. Directed means that the edge between the vertices has a directional attribute (i.e. from one node to the next). What exactly this means will vary from one network graph to another. Undirected means that the edge simply connects the two vertices, but does not convey a direction to the relationship (described in Hernandez-Lopez, 2010).

Edges in networks can also have weights. The weight is a representation of the strength of the relationship.

A couple questions I have at this point are:

Can we represent the strength of a relationship by the length of the edge or the line width?

One thing that came to mind at this point is how one can visually represent nodes and edges. A weak relationship could be represented by a thin line or a long line between the nodes (or both?)

For directional edges - if bidirectional would we put arrowheads on both ends?

Maybe these questions will be addressed when we get into Network Visualization in Lecture 12.

The lecture then discussed single-mode versus two-mode networks. Single-mode networks are those in which all of the entities (nodes) are the same type, whereas two-mode networks are those in which there are two kinds of entities. For both of these types of networks, edges can be directed or undirected and weighted or unweighted. In discussing two-mode networks, Opashl (2013) says that such networks are rarely analyzed without transforming them to single-mode networks, as most tools for analyzing networks have been designed for the latter type.

Next, we looked at how we can represent networks. The lecture compared and contrasted two formats (edge list and adjacency matrix) for both directed and undirected graphs.

One question I had here was how weights would be represented in either of these. I guess that in an edge list it would simply be a matter of adding a "weights" column to the table, and for an adjacency matrix, rather than just 1's and 0's representing an edge or no edge, the weight values could be represented in the cells of the table itself.

Ultimately, the representation of data in tables or matrices makes computational sense, but for a network of any degree of complexity it doesn't help a human understand it. This is where visualization (and the next lecture) come in.

LECTURE 12 -- Network Visualization

In this module, we started off looking at what exactly visualization is. By definition, it is the transformation of data into a representation that can be imagined or seen. Of course, it's way more complex than just pretty graphs. In the lecture, a high-level view of some of the different visualization techniques was discussed.

My master's degree is in Cognitive Psychology, so this area is inherently interesting to me. Cognitively, our brains are wired to understand visual patterns, not lists of data in tables, so visualization is critical to understanding complex data.

The AT&T Labs Research (2009) article discusses some of the considerations when choosing the visualization for a given set of data. One interesting quote came at the end:

"The distinction between visualization and interface is blurring, and the line between visualization and analysis is being crossed" (AT&T, 2009)

Basically, the idea is that visualizations are no longer just static pictures, but instead can be dynamic interfaces that change as different operations are performed on the data. That ability makes tools and tool choice very important.

The last part of the lecture had us perform a network visualization on our LinkedIn network using LinkedIn Maps. I've always found it interesting when I see my connections in LinkedIn who have common connections (e.g. the "people you may know" function illustrates this very easily when you see totally unrelated connections are connected to some new, random person). The InMaps feature was especially interesting. I used it on my connections and readily saw clusters of connections in my network:

What is interesting is how InMaps clustered the orange and green blobs close together, because historically the individuals in those groups came from a common company.

LECTURE 13: Structural Properties of Networks

The final lecture of the week focussed on properties of networks, namely properties of entities and how they relate to one another.

Measures of centrality discussed were:

Degree Centrality -- the number of connections (in/out for directed graphs) that a node has
Betweenness Centrality -- the number of shortest paths that go through a node
Closeness Centrality -- average of lengths of the shortest paths from a node to all other nodes
Eigenvector Centrality -- a measure for a node that incorporates the eigenvector centrality measures for connected nodes

Obviously, for complex networks, these computations are best done by software packages!

Next, the lecture discussed how the various centrality measures can be interpreted. An example was given for social networks:

Degree: how many people can this person reach directly?
Betweenness: how likely is this person to be the most direct route between two people in the network?
Closeness: how fast can this person reach everyone in the network?
Eigenvector: how well is this person connected to other well-connected people?

The discussion of closeness centrality made me think of "Six Degrees of Kevin Bacon" (http://www.oracleofbacon.org). The two-mode directed graph of nodes (actors and movies) lets one visualize the relationships between actors. For example, here is a small sample:

The Oracle of Bacon site is cool because it will calculate the "Bacon Number" between Kevin Bacon and another actor, which is a closeness measure. I don't have an IMDB entry, but a friend of mine does, so I calculated his Bacon Number:

So I guess that would give me a "Bacon Number" of 3 ;^)

The lecture then examined measures that were more network-centric, such as:

Reciprocity -- the degree of mutuality in a directed network
Density -- the ratio of the number of edges to the number of possible edges
Clustering Coefficient -- a measure of the density of a network or a community within a network
Distances -- the average distance of a network is the average of the shortest paths for all the nodes. The diameter of the network is the largest of the shortest paths.

Finally, the lecture examined components and the connectedness of components. Connectedness can be used to identify cliques within networks.

References

"AT&T Researchers” Inventing the Science Behind the Service." AT&T Labs Research. N.p., 11 Oct. 2009. Accessed Web. 2 Mar. 2014. <http://www.research.att.com/articles/featured_stories/2009/200910_more_than_just_a_picture.html?fbid=v4xybf4OUtS>.

Hernandez-Lopez, Rogelio. "Complex networks and collective behavior in nature." Complex networks and collective behavior in nature. 2010. Accessed Web. 2 Mar. 2014. <http://web.mit.edu/8.334/www/grades/projects/projects10/Hernandez-Lopez-Rogelio/structure_1.html>.

Opsahl, T., 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks 35, doi:10.1016/j.socnet.2011.07.001 <http://toreopsahl.com/tnet/two-mode-networks/defining-two-mode-networks>

Online Business Intelligence Spring 2014

Sunday, March 2, 2014

Week 6 Reflections

LECTURE 11 -- Introduction to Networking

LECTURE 12 -- Network Visualization

LECTURE 13: Structural Properties of Networks

References

No comments:

Post a Comment

About Me