socceranalysis.outlier_identification

Module Contents

Functions

get_outliers(df, col[, method, thresh])

Returns outliers in the dataset based on values of a variable

socceranalysis.outlier_identification.get_outliers(df, col, method='SD', thresh=3)[source]

Returns outliers in the dataset based on values of a variable

This function identifies outliers in the dataset based on either of the following methods:

1. Interquartile Range (IQR) Method: Identifies all values less than Q1 - 1.5*IQR and greater than Q3 + 1.5*IQR where IQR = Q3-Q1, are identified as outliers.

2. Mean and Standard Deviation Method: Identifies all values less than mean - k*standard_deviation and greater than mean + k*standard_deviation as outliers.

Parameters:
  • df (dataframe) – Dataframe in which outliers are to be identified.

  • col (str) – Variable in the dataframe based on which outliers are to be identified.

  • method (str) – Name of the outlier identification method to be used. “IQR” for IQR method and “SD” for mean and standard deviation method.

  • thresh (int) – The value of k in the Mean and Standard Deviation Method formula above.

Returns:

Subset of original dataframe containing only rows corresponding to outliers.

Return type:

dataframe

Examples

>>> get_outliers(df,"Wages_Euros","SD",3)