There are three main ways of measuring central tendency- median, mean, and mode. Each has its own strengths and weaknesses, which is why data scientists use all three depending on the dataset they are examining. In this article, we will explore what each measure is and how it can be used to gain insights into data.
Don’t wait, download now and transform your career!Your FREE Guide to Become a Data Scientist
What is the median?
The median is the value that falls in the middle of a dataset when it is sorted from smallest to largest. To calculate the median, simply sort the data and find the value in the middle. If there are an even number of values, the median is calculated as the average of the two middle values.
The median is not affected by outliers, which makes it a good choice for datasets with extreme values. It is also easy to calculate by hand, which makes it a good choice for small datasets.
How to calculate the median
The median can be calculated by hand for small datasets, or using a spreadsheet program or statistical software for larger datasets.
To calculate the median by hand, simply sort the data from smallest to largest and find the value in the middle. If there are an even number of values, the median is calculated as the average of the two middle values.
For example, if the dataset is {12, 13, 14, 15, 16}, the median would be calculated as (14+15)/(16-13)=14.
The median can also be easily calculated using a spreadsheet program or statistical software. Simply enter the data into the spreadsheet and use the built-in median function to find the value.
The median is a robust statistic, meaning that it is not affected by outliers in the data. This makes it a good choice when working with datasets that may contain outliers.
What is the mean?
How to calculate the mean
Calculating the mean is a simple process that can be done by hand or with a spreadsheet program. To calculate the mean by hand, simply add up all of the values in a dataset and then divide by the number of values.
For example, if you have the following dataset:
- 12
- 18
- 24
- 30
The mean would be calculated as follows:
(12 + 18 + 24 + 30) / (12+18+24+30) = 78 / 78 = 18.
So, the mean of this data set is 18.
If you’re using a spreadsheet program like Microsoft Excel, you can use the AVERAGE function to calculate the mean. Simply select all of the cells that contain data, click on the Formulas tab, and then select AVERAGE from the Statistical functions drop-down menu.
What is the mode?
The mode is the most common value in a dataset. To calculate the mode, simply sort the data and find the value that appears most often. The mode is not affected by outliers and can be used to get an idea of the general shape of a dataset.
However, it is important to note that a dataset can have more than one mode. This is because the mode is only concerned with the most common value and not the second most common value.
This means that the mode is not always a good measure of central tendency
How to calculate the mode
The mode can be calculated by hand or using a spreadsheet. To calculate the mode by hand, simply sort the data and find the value that appears most often.
For example, let’s say we have the following dataset:
- 14
- 20
- 32
- 20
- 16
- 20
To calculate the mode, we would sort the data and find that the value “20” appears most often. Therefore, the mode of this dataset is 20.
However, it is important to note that a dataset can have more than one mode. This means that the mode is not always a good measure of central tendency.
To calculate the mode using a spreadsheet, select all of the cells that contain data, click on the Formulas tab, and then select MODE from the Statistical functions drop-down menu.
What is the range?
How to calculate the range
To calculate the range using a spreadsheet, select all of the cells that contain data, click on the Formulas tab, and then select RANGE from the Statistical functions drop-down menu.
To calculate it by hand, simply subtract the smallest value from the largest value.
When are the mean, median and mode used?
All three of these measures are important in data science and computing. When choosing which measure to use, it is important to consider what insights you are hoping to gain from your data.
If you are looking for a general overview of your data, the median or mean might be a good choice. This is because they are less affected by outliers than the mode. The median is a good measure of central tendency when there are outliers in the data set because it is not affected by them as much as the mean.
The mode is a good measure to use when you are interested in finding the most common value in the data set. This is because it is not affected by outliers.
I hope this article has helped you to understand the difference between median, mean, and mode. As always, if you have any questions or comments, please feel free to reach out to us on our website or on social media. We would be happy to chat with you about your data!