Search Helium

Home > Sciences > Mathematics

How to read and interpret a box plot

by Gunter Chang

Created on: January 23, 2009   Last Updated: January 27, 2012

The Box-Plot, also known as the box and whisker plot, is a graphical method of displaying five descriptive statistics: the median, the upper and lower quartiles, and the minimum and maximum data values. First created by John Tukey in a 1977 publication, Box Plots have evolved into a familiar and useful standard in data interpretation. Interpretation of a Box Plot is relatively straightforward.

The "box" itself represents the middle 50 percent of the data. The upper boundary (also known as the "hinge") of the box locates the 75th percentile of the data set while the lower boundary indicates the 25th percentile. Quite simply, the 25th percentile represents the value where 25 percent of the data is lower, and likewise, the 75th percentile represents the value that 75 percent of the data falls below. The area between these two boundaries is known as the "inter-quartile range" and this gives a useful indication of the "spread" of the middle 50 percent of the data. This is a more robust range for interpretation because the middle 50 percent is not affected by outliers or extreme values, and gives a less biased visualization of the data spread.

There is also a line in the box that indicates the "median" (or central most value) of the data. Not to be confused with the "mean", the median is the value that is the middle of the data set when the values are ranked in order, resulting in the same number of values above as below. This is a measure of "central tendency", or in layman's terms, where the center of the data is. Knowing this is important to estimating the type of data distribution you have.

The "whiskers" of the box-plot are the vertical lines of the plot extending from the box, and indicate the minimum and maximum values in the dataset. If there are "outliers" in the data, the whiskers extend to their maximum of 1.5 times the inter-quartile range. Now that the pieces of the Box Plot have been identified, it is useful to understand that the box, the whiskers, even the median can reveal much information about a dataset by virtue of their position, length, or size.

The strong point of the Box Plot is its ability to compare two populations without knowing anything about the underlying statistical distributions of those populations. The distribution that defines a population also determines the type of statistical analyses that can be properly applied, so the Box Plot actually allows you to compare "apples and oranges" graphically that might not be directly comparable

Helium Debate

Cast your vote!

Is the mystery of the Bermuda Triangle based on fact or fiction?

Click for your side.

121751

Featured Partner

Sunshine Week

Sunshine Week is a nonpartisan, good-government effort led by the American Society of Newspaper Editors, but with a constituency that goes beyond print, broadcast and online news media to include students of all ages; federal, state and ...more


CONNECT WITH US

Read
our blog
Helum for writers

Write and get published
Share with other writers
Polish your freelancing skills

Join our active writing community
Helium Content Source for Publishers

Quality articles from proven freelancers
Exclusive rights, fast turnaround
Brand engagement, business blogging -- our writers do it all

Get custom content today!

INFORMATION


Helium, Inc.
200 Brickstone Square Andover, MA 01810 USA
#