Well actually these can give you different insights into your models errors. If yy is your target, pp your prediction and e=p−ye=p−y the errors:
- Mean Error: ME=mean(e)ME=mean(e)
In (-∞,∞), the closer to 0 the better.
Measures additive bias in the error. Unbiased estimates should have the same mean as your target thus ME should be close to 0, if it’s positive your predictions overestimate the target, if it’s negative they underestimate.
- Root Mean Squared Error: RMSE=mean(e2)−−−−−−−−√RMSE=mean(e2).
In [0,∞), the smaller the better.
Measures the mean square magnitude of errors. Root square is taken to make the units of the error be the same as the units of the target. This measure gives more weight to large deviations such as outliers, since large differences squared become larger and small (smaller than 1) differences squared become smaller.
- Mean Absolute Error: MAE=mean(|e|)MAE=mean(|e|).
In [0,∞), the smaller the better.
Measures the absolute magnitude of errors and it’s units are the same as the units of the target. Makes for more easily interprectable errors and gives less weight to outliers. However a model with good MAEMAE can have punctually very high errors.
- (Root) Mean Squared Log Error: MSLE=mean((log(p+1)−log(y+1))2)MSLE=mean((log(p+1)−log(y+1))2).
In [0,∞), the smaller the better.
This is useful when dealing with right skewed targets, since taking the log transform makes the target more normally distributed. In practice it’s usually achieved by changing the target to y^=log(y+1)y^=log(y+1) and then predicting as y=ey^−1y=ey^−1
- Median Absolute Deviation: MAD=median(e−median(e))MAD=median(e−median(e)).
In [0,∞), the smaller the better.
This is a spread metric similar to standard deviation but meant to be more robust to outliers. Instead of taking means of squares as the sd, MAD takes medians of absolutes making it more robust.
- R², coefficient of determination:
In (−∞,1] the closer to 1 the better Is a measure of the ratio of variability that your model can capture vs the natural variability in the target variable.
In practice I usually use a combination of MEME, R2R2 and: RMSERMSE if there are no outliers in the data, MAEMAE if I have a large dataset and there may be outliers, RMLSERMLSE if the target is right skewed.
This link offers a very nice overview on the matter: http://www.cawcr.gov.au/projects/verification/#Methods_for_foreasts_of_continuous_variables