Thursday, November 6, 2008

Star Plots

A star plot is a graphical method of displaying multivariate data with an arbitrary number of variables. Each variable is represented by a separate spoke, with each star representing a single observation. The length of each spoke is proportional to the magnitude of the variable for the data point relative to the maximum magnitude of the variable across all data points. A line is drawn connecting the data values for each spoke. This gives the plot a star-like appearance and the origin of the name of this plot.

Star plots can be used to answer such questions as: What variables are dominant for a given observation? Which observations are most similar, i.e., are there clusters of observations?
Are there any outliers?
The star plot observed above depicts crime rates as a function of 7 separate variables of major cities across the US.

Correlation Matrix

A correlation is a single number that describes the degree of relationship between two variables. A Correlation Matrix lists the individual correlations between any two sets of variables.

To locate the correlation for any pair of variables within the matrix, you would need to find the value in the table for the row and column intersection for those two variables. correlation Matrices are quite similar to similarity matrices, except for that correlations have values ranging ONLY from -1 to 1, with -1 representing the least amount of correlation and 1 respresenting the greatest.

Similarity Matrix

A similarity matrix is a matrix of scores which express the similarity between two data points. They are used in sequence alignment: higher scores are given to more similar characters, and lower or negative scores are given for dissimilar characters.
Above is a similarity matrix of common amino acids. The similarity or dissimilarity of each amino acid pair can be found by finding the cross value of the two.

Stem and Leaf Plot

A stem-and-leaf plot is a display, similar to a histogram, that organizes data to assist in visualizing the shape of a distribution. A basic stem and leaf plot contains two columns separated by a vertical line. The left column contains the stems (the first digit of a value) and the right column contains the leaves (the remaining digits of each value).

Box Plot

A Box Plot, also known as a box-and-whisker diagram, is an effecient method used for displayig 5-number data summaries. Box Plots summarize the following statistical measures: median, upper and lower quartiles, and the minimum and maximum data values. An example of a boxplot is shown below:

Wednesday, October 29, 2008


A histogram is a graphical display of frequencies, shown as bars. It shows what proportion of observations fall into each of several preset categories. A histogram differs from a bar chart in that it is the area of the bar that denotes value, not the height as in bar charts. The histogram above shows frequencies of exam scores, and the area under each segment is directly proportional to the number of scores falling into that category.

Parallel Coordinate Graph

Parallel Coordinate graphs is a data visualization technique used in analyzing large sets of multivariate data. Each variable in the data plot is represented as its own Y Axis on the graph. A maximum point for each Y axis is selected, and they are scaled relatively to each other so that each variable takes up the same area in the graph space. Each line drawn represents a single observation as it relates to each variable. Lines are drawn across each variable for each observation.
The Parallel Coordiate graph above illustrates correlations in gene expression data for different species of drosophilia (fly genes).