Tuesday, 13 December 2022

Statistical fundamentals and terminology for model building and validation

1. Mean: This is a simple arithmetic average, which is computed by taking the aggregated sum of values divided by a count of those values. The mean is sensitive to outliers in the data. An outlier is the value of a set or column that is highly deviant from the many other values in the same data; it usually has very high or low values. 

2. Median: This is the midpoint of the data, and is calculated by either arranging it in ascending or descending order. If there are N observations. 

3. Mode: This is the most repetitive data point in the data


import numpy as np
import statistics as stats
data = np.array([4,5,1,2,7,2,6,9,3,9,9,2])
# Calculate Mean
dt_mean = np.mean(data) ; print ("Mean :",round(dt_mean,2))
# Calculate Median
dt_median = np.median(data) ; print ("Median :",dt_median)
# Calculate Mode
dt_mode = stats.multimode(data);
print(dt_mode)

Result:
Mean : 4.92 Median : 4.5 Mode: [2, 9]