Box Plot Explained: Interpretation, Examples, & Comparison (2024)

In descriptive statistics, a box plot or boxplot (also known as a box and whisker plot) is a type of chart often used in explanatory data analysis. Box plots visually show the distribution of numerical data and skewness by displaying the data quartiles (or percentiles) and averages.

Box plots show the five-number summary of a set of data: including the minimum score, first (lower) quartile, median, third (upper) quartile, and maximum score.

Box Plot Explained: Interpretation, Examples, & Comparison (1)

Definitions

Minimum Score

The lowest score, excluding outliers (shown at the end of the left whisker).

Lower Quartile

Twenty-five percent of scores fall below the lower quartile value (also known as the first quartile).

Median

The median marks the mid-point of the data and is shown by the line that divides the box into two parts (sometimes known as the second quartile). Half the scores are greater than or equal to this value, and half are less.

Upper Quartile

Seventy-five percent of the scores fall below the upper quartile value (also known as the third quartile). Thus, 25% of data are above this value.

Maximum Score

The highest score, excluding outliers (shown at the end of the right whisker).

Whiskers

The upper and lower whiskers represent scores outside the middle 50% (i.e., the lower 25% of scores and the upper 25% of scores).

The Interquartile Range (or IQR)

The box plot shows the middle 50% of scores (i.e., the range between the 25th and 75th percentile).

Why are box plots useful?

Box plots divide the data into sections containing approximately 25% of the data in that set.

Box Plot Explained: Interpretation, Examples, & Comparison (2)

Box plots are useful as they provide a visual summary of the data enabling researchers to quickly identify mean values, the dispersion of the data set, and signs of skewness.

Note that the image above represents data that has a perfect normal distribution, and most box plots will not conform to this symmetry (where each quartile is the same length).

Box plots are useful as they show the average score of a data set

The median is the average value from a set of data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value, and half are less.

Box plots are useful as they show the skewness of a data set

The box plot shape will show if a statistical data set is normally distributed or skewed.

Box Plot Explained: Interpretation, Examples, & Comparison (3)

When the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric.

When the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right).

When the median is closer to the top of the box, and if the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left).

Box plots are useful as they show the dispersion of a data set

In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a distribution is stretched or squeezed.

The smallest and largest values are found at the end of the ‘whiskers’ and are useful for providing a visual indicator regarding the spread of scores (e.g., the range).

Box Plot Explained: Interpretation, Examples, & Comparison (4)

The interquartile range (IQR) is the box plot showing the middle 50% of scores and can be calculated by subtracting the lower quartile from the upper quartile (e.g., Q3−Q1).

Box plots are useful as they show outliers within a data set

An outlier is an observation that is numerically distant from the rest of the data.

When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot.

Box Plot Explained: Interpretation, Examples, & Comparison (5)

Source: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

For example, outside 1.5 times the interquartile range above the upper quartile and below the lower quartile (Q1 – 1.5 * IQR or Q3 + 1.5 * IQR).

How to compare box plots

Box plots are a useful way to visualize differences among different samples or groups. They manage to provide a lot of statistical information, including — medians, ranges, and outliers.

Note although box plots have been presented horizontally in this article, it is more common to view them vertically in research papers.

Step 1: Compare the medians of box plots

Compare the respective medians of each box plot. If the median line of a box plot lies outside of the box of a comparison box plot, then there is likely to be a difference between the two groups.

Box Plot Explained: Interpretation, Examples, & Comparison (6)

Source: https://blog.bioturing.com/2018/05/22/how-to-compare-box-plots/

Step 2: Compare the interquartile ranges and whiskers of box plots

Compare the interquartile ranges (that is, the box lengths) to examine how the data is dispersed between each sample. The longer the box, the more dispersed the data. The smaller, the less dispersed the data.

Box Plot Explained: Interpretation, Examples, & Comparison (7)

Next, look at the overall spread as shown by the extreme values at the end of two whiskers. This shows the range of scores (another type of dispersion). Larger ranges indicate wider distribution, that is, more scattered data.

Step 3: Look for potential outliers (see the above image)

When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot.

Step 4: Look for signs of skewness

If the data do not appear to be symmetric, does each sample show the same kind of asymmetry?

Box Plot Explained: Interpretation, Examples, & Comparison (8)

Box Plot Explained: Interpretation, Examples, & Comparison (9)

Box Plot Explained: Interpretation, Examples, & Comparison (2024)

FAQs

How to do interpreting box plots? ›

How is a boxplot interpreted? The box itself indicates the range in which the middle 50% of all values lie. Thus, the lower end of the box is the 1st quartile and the upper end is the 3rd quartile. Therefore below Q1 lie 25% of the data and above Q3 lie 25% of the data, in the box itself lie 50% of your data.

How do you describe the results of a Boxplot? ›

The middle “box” represents the middle 50% of scores for the group. The range of scores from lower to upper quartile is referred to as the inter-quartile range. The middle 50% of scores fall within the inter-quartile range. Seventy-five percent of the scores fall below the upper quartile.

How do you interpret a boxplot with two variables? ›

Examine your boxplot to look at the center and spread of your data and compare differences between grouping variables within your data. Examine the median, the interquartile box, and identify outliers as you interpret the distribution of your data.

How do you know if there is a significant difference in boxplot? ›

The intuitive rule was: If a set of boxplots overlaps then there is no statistical difference between the two samples. If the boxplots do not overlap then perhaps there is a statistically significant difference.

How do you analyze data in a box plot? ›

Compare the interquartile ranges (that is, the box lengths) to examine how the data is dispersed between each sample. The longer the box, the more dispersed the data. The smaller, the less dispersed the data. Next, look at the overall spread as shown by the extreme values at the end of two whiskers.

How to interpret the shape of a box plot? ›

The box length gives an indication of the sample variability and the line across the box shows where the sample is centred. The position of the box in its whiskers and the position of the line in the box also tells us whether the sample is symmetric or skewed, either to the right or left.

How to describe distribution of box plot? ›

Boxplot Distribution

Negatively Skewed: If the distance from the median to minimum is greater than the distance from the median to the maximum, then the box plot is negatively skewed. Symmetric: The box plot is said to be symmetric if the median is equidistant from the maximum and minimum values.

What insights can we extract from the boxplot? ›

Box plots provide a quick visual summary of the variability of values in a dataset. They show the median, upper and lower quartiles, minimum and maximum values, and any outliers in the dataset. Outliers can reveal mistakes or unusual occurrences in data.

How to interpret outliers in boxplots? ›

If the data do not extend to the end of the whiskers, then the whiskers extend to the minimum and maximum data values. If there are values that fall above or below the end of the whiskers, they are plotted as dots. These points are often called outliers. An outlier is more extreme than the expected variation.

What is a comparative boxplot? ›

It is used to compare multiple sets of data describing the same, single variable. It uses separate box plots for each data set. It allows comparisons of the median (center), upper and lower extremes, quartiles, interquartile range (IQR), and range between and among multiple data sets.

How do you interpret a distribution of a box plot? ›

When the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric. When the median is closer to the bottom of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right).

How do you interpret the spread of a boxplot? ›

A boxplot represents spread in two ways: by conveying the interquartile range (IQR) and the range. The interquartile range (IQR) is the difference between the third and first quartiles. The box of a boxplot spans from the first quartile to the third quartile (with the line intersecting the box marking the median).

How do you interpret a difference plot? ›

A difference plot shows the differences between two observations on the same sampling unit. The difference plot shows the difference between two observations on the vertical axis against the average of the two observations on the horizontal axis. A gray identity line represents equality; no difference.

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 5861

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.