Overplotting

Learn about overplotting and how to address it using some simple techniques.

The large mass of points near (0, 0) in the following figure can cause some confusion because it’s hard to tell the true number of points that are plotted. This is the result of a phenomenon called overplotting. As one can guess, this corresponds to points being plotted on top of each other over and over again. There are two methods to address the issue of overplotting. Either by:

  • Adjusting the transparency of the points

  • Adding a little random jitter or random nudges to each of the points

Method 1: Changing the transparency

The first way of addressing overplotting is to change the transparency of the points by setting the alpha argument in geom_point(). We can change the alpha argument to be any value between 0 and 1, where 0 sets the points to be 100% transparent and 1 sets the points to be 100% opaque. By default, the alpha is set to 1. In other words, if we don’t explicitly set an alpha value, R will use alpha = 1.

Note how the following code is identical to the code that created the scatterplot with overplotting (in the previous lesson) but with alpha = 0.2 added to the geom_point() function:

Get hands-on with 1200+ tech skills courses.