Introduction to Data Visualization
Learn about the importance of data visualization and its benefits in data analysis. Explore the base R packages for data visualization.
What is data visualization?
Data visualization is a graphical way of representing available information. Data visualization tools, which include visual elements such as charts and graphs, make it easy to view trends and patterns in data at a glance. Moreover, it is a great way to communicate data to technical and non-technical people. The first step is to understand the data before deciding which data visualization technique is suitable for the intended purpose. Then, apply the type of visualization that effectively conveys the data in the best way. For example, time series data, such as the price of a stock, is usually displayed with a line chart where dates are placed horizontally (x-axis), and the stock price forms the y-axis. This way, one can quickly see the moving stock price trend and make predictions about future stock prices.
Similarly, to compare a product’s year-on-year (YoY) sales performance, data analysts often use bar charts to display current and previous values. Bar charts allow for a quicker comparison of these two values and understand whether the product performance has improved in the current year. Newer advanced charts, such as waffle charts, can help display the percentage distribution where each box represents a set percentage.
Therefore, designing effective data visualizations can be considered an art and a science, due to which it is a challenging skill for novices to experts. Undoubtedly, it is a crucial skill to have if you want to learn data storytelling with impactful visualizations.
Where do we use data visualization?
Every day, massive amounts of data is generated. All of this information is difficult for humans to analyze and process manually. The human brain perceives visual information more efficiently than text. That is why data visualization is crucial since it converts complex numbers and other information into visualizations that are easier to understand and use. It also allows decision-makers to find new trends in data and get data-driven insights to make better decisions. Data visualization finds its applications in a wide range of businesses, such as:
-
Banking and finance: Banking and financial organizations use multiple dashboards consisting of various data visualizations to keep track of their customers, services, and products. Data visualizations are also a part of the dashboards implemented for fraud and risk management operations.
-
Marketing and advertising: Sales figures and customer reviews form a significant chunk of data used for visualization by businesses to design and optimize their marketing and advertising strategies.
-
Insurance claims: Insurance firms track all aspects of insurance claim settlements with comprehensive key performance indicator (KPI) dashboards created using their customer, agents, and sales data.
-
Healthcare: Patient records, staff, and services management are some areas where data visualizations provide an organized and convenient way for healthcare firms to manage their resources for better patient care effectively.
-
Manufacturing: In the manufacturing industry, departments such as inventory, production, testing, quality assurance, sales, etc., share process data with each other. Visualization of this data gives a better understanding of the processes and allows efficient workflow planning.
How to design an effective visualization
Data visualization is crucial in communicating the information uncovered from a dataset. But is it necessary to use a specific chart, or can any other chart be used instead? This will depend on the data and vary with each case. So, we must thoroughly understand our data in order to develop an effective visualization with a clear objective. Knowing our audience and understanding the purpose of the visualization before preparing the chart is necessary. The information we want to convey through our visualization can be discovered using the following questions:
- What type of data is it? Is it quantitative or qualitative?
- Is our data raw or clean?
- What do we want to communicate with our data?
- How do the different elements of the data relate to each other?
After we have answered the questions above, it will be easier to decide which visualization we should use and how it will communicate with the audience. It is also a good practice to cross-check whether our designed visualization conveys the expected message before presenting it to an audience.
Which data visualization techniques to use?
The type of data visualization technique depends on the type of data we’re working with, as well as the story we’re telling with our data. For example, to compare different categories of data, a bar chart is a good option and has the distinct advantage of conveying the required information to the audience without requiring them to look at specific data. Overall, there are several techniques available to fulfill our visualization requirements. Some of them can be seen in the image below:
This course uses the R programming language for data visualization. The R language is a great tool for data visualization since it has a wide range of built-in functions and libraries that can be used for almost any data-related task.
Data visualization with R
By coding just a few lines in R, it is possible to produce aesthetic data visualizations utilizing R’s various features for this purpose. With a standard R installation, three built-in graphic packages are available, namely base graphics
, grid
, and lattice
, for generating a wide range of plots.
The base graphics
package
The base graphics
package is R’s original and oldest visual framework for creating data graphics. It has various plotting functions such as plot()
, hist()
, boxplot()
, and many more. The base graphics
package is created by initializing a new plot and then annotating an existing plot. Many global settings can be adjusted and modified in the base graphics
system.
In this course, you’ll be working on some basic plots using base graphics. This will help you better understand the advantages of the ggplot2
package for data visualization in R.
The grid
package
The grid
graphics package is a low-level system for plotting within R and separate from the base R graphics. It was developed by Paul Murrel and added to R at a later stage. It allows the drawing and arranging of basic geometric shapes like polygons, curves, etc.
The lattice
package
The lattice
package is based on a grid graphics package developed by Deepayan Sarkar. It consists of high-level functions for each task that return objects that can be converted to graphs by the plot()
functions of the base R package. This package handles multivariate data very efficiently.