socceranalysis.outlier_identification
¶
Module Contents¶
Functions¶
|
Returns outliers in the dataset based on values of a variable |
- socceranalysis.outlier_identification.get_outliers(df, col, method='SD', thresh=3)[source]¶
Returns outliers in the dataset based on values of a variable
This function identifies outliers in the dataset based on either of the following methods:
1. Interquartile Range (IQR) Method: Identifies all values less than Q1 - 1.5*IQR and greater than Q3 + 1.5*IQR where IQR = Q3-Q1, are identified as outliers.
2. Mean and Standard Deviation Method: Identifies all values less than mean - k*standard_deviation and greater than mean + k*standard_deviation as outliers.
- Parameters:
df (dataframe) – Dataframe in which outliers are to be identified.
col (str) – Variable in the dataframe based on which outliers are to be identified.
method (str) – Name of the outlier identification method to be used. “IQR” for IQR method and “SD” for mean and standard deviation method.
thresh (int) – The value of k in the Mean and Standard Deviation Method formula above.
- Returns:
Subset of original dataframe containing only rows corresponding to outliers.
- Return type:
dataframe
Examples
>>> get_outliers(df,"Wages_Euros","SD",3)