Thursday, November 6, 2008

Star Plots


A star plot is a graphical method of displaying multivariate data with an arbitrary number of variables. Each variable is represented by a separate spoke, with each star representing a single observation. The length of each spoke is proportional to the magnitude of the variable for the data point relative to the maximum magnitude of the variable across all data points. A line is drawn connecting the data values for each spoke. This gives the plot a star-like appearance and the origin of the name of this plot.

Star plots can be used to answer such questions as: What variables are dominant for a given observation? Which observations are most similar, i.e., are there clusters of observations?
Are there any outliers?
The star plot observed above depicts crime rates as a function of 7 separate variables of major cities across the US.
http://www.math.yorku.ca/SCS/Gallery/images/starcrim2.gif

Correlation Matrix


A correlation is a single number that describes the degree of relationship between two variables. A Correlation Matrix lists the individual correlations between any two sets of variables.

To locate the correlation for any pair of variables within the matrix, you would need to find the value in the table for the row and column intersection for those two variables. correlation Matrices are quite similar to similarity matrices, except for that correlations have values ranging ONLY from -1 to 1, with -1 representing the least amount of correlation and 1 respresenting the greatest.

Similarity Matrix

A similarity matrix is a matrix of scores which express the similarity between two data points. They are used in sequence alignment: higher scores are given to more similar characters, and lower or negative scores are given for dissimilar characters.
Above is a similarity matrix of common amino acids. The similarity or dissimilarity of each amino acid pair can be found by finding the cross value of the two.

Stem and Leaf Plot


A stem-and-leaf plot is a display, similar to a histogram, that organizes data to assist in visualizing the shape of a distribution. A basic stem and leaf plot contains two columns separated by a vertical line. The left column contains the stems (the first digit of a value) and the right column contains the leaves (the remaining digits of each value).


Box Plot

A Box Plot, also known as a box-and-whisker diagram, is an effecient method used for displayig 5-number data summaries. Box Plots summarize the following statistical measures: median, upper and lower quartiles, and the minimum and maximum data values. An example of a boxplot is shown below: