Machine Learning – Mean, Median, and Mode

Mean, Median, and Mode? What are these?

In Machine Learning, statistical measures like Mean, Median, and Mode are used to summarize and analyze data. These measures help us understand the distribution of values in a data set and identify patterns which is essential in predicting future outcomes.

Key Concepts:

  • Mean: The average value of all the numbers.
  • Median: The middle value when the numbers are sorted in order.
  • Mode: The value that appears most frequently.

Example: Jeepney Travel Speeds

Let’s say we have recorded the travel speeds (in kilometers per hour) of 10 jeepneys in a busy area in Metro Manila:

speed = [20, 18, 25, 22, 28, 19, 30, 25, 27, 25]

Our goal is to calculate the average speed, the middle speed, and the most common speed.

Step 1: Calculating the Mean

The mean is the sum of all the values divided by the total number of values. It gives us the average.

Formula:

Mean = (20 + 18 + 25 + 22 + 28 + 19 + 30 + 25 + 27 + 25) / 10 = 23.9

In Python, you can calculate the mean using the NumPy module:

import numpy

speed = [20, 18, 25, 22, 28, 19, 30, 25, 27, 25]
x = numpy.mean(speed)

print(x)

The average speed is 23.9 km/h.

Step 2: Finding the Median

The median is the middle value of the data when it is sorted in ascending order. If there’s an even number of data points the median is the average of the two middle numbers.

Sorted Data:
[18, 19, 20, 22, 25, 25, 25, 27, 28, 30]

The median here is 24 as it is the middle value in the sorted list.

To calculate the median in Python you can use the NumPy module:

import numpy

speed = [20, 18, 25, 22, 28, 19, 30, 25, 27, 25]
x = numpy.mean(speed)

print(x)

The median speed is 25 km/h.

Step 3: Finding the Mode

The mode is the value that appears most frequently in the data set. In this case 25 appears thrice in the list which is more frequent than any other number.

To calculate the mode in Python, you can use the SciPy module:

from scipy import stats

speed = [20, 18, 25, 22, 28, 19, 30, 25, 27, 25]
x = .mean(speed)

print(x)

These statistical concepts are vital in Machine Learning. Understanding how to calculate and interpret the mean, median, and mode will help us better analyze real-world data in every use-case we want to by just simply tracking the jeepney speeds in Metro Manila, to predict store sales, or tasks that are data-driven.

Scroll to Top