Open In App

How to Create Anscombe’s Quartet in Excel?

Last Updated : 13 Nov, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Anscombe Quartet developed a situation that, despite the different datasets, with different scatter charts, data could have the same correlation values among them. Anscombe Quartet is famous to provide four data sets that tell us the importance of graphing and trend lines in the data. In this article, we will learn about Anscombe’s Quartet in excel.

Anscombe’s Quartet

In 1973, Francis Anscombe told us the importance of graphing data before analyzing the data yet they have similar arithmetical identities. Anscombe’s Quartet is a group of four data sets that look identical with the same mean and correlation values but when creating a scatter chart they appear very differently. We are considering eleven data values in each data set.

Anscombe’s Quartet in Excel

Step 1: Given the Anscombe Quartet dataset. The below 4 images, show the 4 datasets created by Anscombe. 

DataSet 1:

Dataset1

 

DataSet 2:

Dataset2

 

DataSet 3:

Dataset3

 

Dataset 4:

Dataset4

 

Step 2: By physically viewing the data above. We are not able to infer what type of correlation data could have. To have better visual knowledge, we will create Scatter charts for each dataset. We will be showing the steps of creating a scatter chart, for dataset1, a similar process can be used to create charts for the rest of the datasets. Select the dataset1, A2:B13. Go to the Insert tab. 

Clicking-insert-tab

 

Step 3: Under the Charts section, click on the Scatter option. Select the scatter chart from the list that appeared. 

Selecting-scatter-chart

 

Step 4: A scatter chart is created for dataset1

scatter-chart-for-dataset1

 

Step 5: Repeat Steps 2,3, and 4, and create similar charts for dataset2, dataset3, and dataset4

Dataset 2:

scatter-chart-for-dataset2

 

Dataset 3:

scatter-chart-for-dataset3

 

Dataset 4:

scatter-chart-for-dataset4

 

Step 6: We can observe from the charts that, each chart has some different visualization in the scatter plot. Dataset1 has an evenly distribution, dataset2 has a parabolic shape, dataset3 is nearly a y = x line, and dataset 4 is nearly a straight line parallel to Y-Axis. Our next task is to add a trend line for each of the graphs, to have further insights into Anscombe’s quartet. Select the chart created for dataset1. Go to the Layout tab, and click on the Trendline option. 

clicking-trendline-from-layout-option

 

Step 7: A drop-down list appears. Click on the Linear Trendline option. 

selecting-linear-trendline

 

Step 8: A trendline line appears in the graph.

trendline-in-graph-for-dataset1

 

Step 9: Repeat steps 6, 7, and 8, and add similar trendlines to dataset2, dataset3, and dataset4.  

Dataset 2:

trendline-in-graph-for-dataset2

 

Dataset 3:

trendline-in-graph-for-dataset3

 

Dataset 4:

trendline-in-graph-for-dataset4

 

Step 10: We can observe that the trend line looks similar for each of the datasets. Let us add the equation of the trend line to know more about the current datasets. Select the chart for dataset1. Go to the Layout tab, and click on Trendline.

clicking-trendline-from-layout-option

 

Step 11: A drop-down list appears. Click on the More Trendline Options. 

clicking-on-more-trendline-options

 

Step 12: A new dialogue box, name, and Format Trendline appear on the screen. Check the box, Display equation on chart

Clicking-option-display-equation-on-chart

 

Step 13: The trend line equation appears on the chart. The equation is y = 0.5x + 3 for dataset1

trendline-equation-for-dataset1

 

Step 14: Repeat steps 10, 11, 12, and 13, and similarly add trendlines equation for dataset2, dataset3, and dataset4. 

Dataset 2:

trendline-equation-for-dataset2

 

Dataset 3:

trendline-equation-for-dataset3

 

Dataset 4:

trendline-equation-for-dataset4

 

We can observe that the equation of trendline is the same for all the datasets i.e. y = 0.5x + 3, despite the fact, that the scatter charts are very different from each other. Hence, the Anscombe quartet proved that only the summary statistics are not completely reliable, we need to look up the graphs also. 

Conclusion

By this, we conclude that graphical representation is important before analyzing the data whether they are identical or whether statistical identities are the same in nature. Graphical representations are different and have different relationships in all data sets.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads