Skip to main content

Machine Learning Loss Functions in Practice

 

Error/Loss functions are used to estimate the loss of a model so that the weights can be updated to reduce the error rate on the next iteration. 

As you have clicked in this article, I am assuming you know the fundamental stuffs of machine learning pipelines and you want to know about loss functions specifically. So, let's jump directly to loss functions. I will also show you how you can use these loss functions in Scikit-learn/Pytorch.

Broadly, we can categorize loss functions in two categories. 

  1. Loss functions for Regression problems.
  2. Loss functions for Classification problems.

Regression problems

The two most common loss functions for regression problems are:

  1. MSE( Mean Squared Error) 
  2. MAE (Mean Absolute Error) 

MSE / Quadratic Loss / L2 Loss

 If the target values falls into Gaussian/ Normal distribution, then it is the preferred loss function for regression problems. MSE is the sum of squared distances between target variables (ground truth) and predicted values.

The implementation of MSE in NumPy and Scikit learn is given below: 

For our convenience, let's say we have our target values and predicted are as follows: 

"""
y_true: ground truth values 
y_pred: predicted values

"""

y_true = [1.23, 1.09, 0.24, 0.26, 0.78, 2.90]
y_pred = [1.29, 1.01, 0.34, 0.20, 0.79, 1.90]

Now, if we want to calculate the MSE: 

# numpy implementation
# y_true: ground truth
# y_pred: predicted values 

def mse_np(y_true, y_pred):
    error = np.square(np.subtract(y_true, y_pred)).mean() 
    return error 


# scikit learn implementation
from sklearn.metrics import mean_squared_error
def mse_sklearn(y_true, y_pred):
    error = mean_squared_error(y_true, y_pred)  
    return error

In PyTorch, we can do the following to calculate MSE loss. nn module in pytorch contains all loss functions. As for example, we are considering all inputs to the functions as Python List, at first we need to convert python list to pytorch tensor, and then we will pass the ground truth values and predicted values to our criterion (loss function).

The code is given below: 

import torch
from torch import nn


def mse_torch(true_labels, predicted_labels):
    loss = nn.MSELoss()
    true_labels = torch.Tensor(true_labels)
    predicted_labels = torch.Tensor(predicted_labels)
    error = loss(true_labels, predicted_labels)
    return error


MAE/ L1 Loss

 If the distribution of the target values contain outliers (minimum and maximum value are far away from the mean value) then MAE is typically used. 

It is calculated as the average of the absolute difference between ground truth and predicted values. 

The implementation of MAE in NumPy and Scikit-learn is given below: 

# numpy implementation of mae 
def mae_np(y_true, y_pred):
    error = np.abs(np.subtract(y_true, y_pred)).mean() 
    return error 


# scikit learn implementation of mae
from sklearn.metrics import mean_absolute_error

def mae_sklearn(y_true, y_pred):
    error = mean_absolute_error(y_true, y_pred)  
    return error

In pytorch, we can calculate the MAE as following: 

import torch
from torch import nn


def mae_torch(true_labels, predicted_labels):
    loss = nn.L1Loss()
    true_labels = torch.Tensor(true_labels)
    predicted_labels = torch.Tensor(predicted_labels)
    error = loss(true_labels, predicted_labels)
    return error

 

Classification Problems

Most common loss function for classification problem is cross-entropy loss/Log Loss. 

The NumPy implementation and Scikit learn implementation is given below: 

predictions = np.array([[0.25, 0.25, 0.25, 0.25], [0.01, 0.01, 0.01, 0.97]])
targets = np.array([[1, 0, 0, 0], [0, 0, 0, 1]])


# numpy implentation
def cross_entropy_np(targets, predictions):
    N = predictions.shape[0]
    error = -np.sum(targets * np.log(predictions)) / N
    return error


# scikit learn implementation
def cross_entropy_sk(targets, predictions):
    error = log_loss(targets, predictions)
    return error

The implementation of Pytorch is bit different as the target shape is expected different (1D tensor) in PyTorch. The below code is a sample how we can use the cross-entropy loss in Pytorch.

def cross_entropy_torch(prediction, target):
    loss = nn.CrossEntropyLoss()
    output = loss(prediction, target)
    return output 
 
prediction = torch.randn(3, 5)
target = torch.empty(3, dtype=torch.long).random_(5)

print(cross_entropy_torch(prediction, target))

This is the first part of the loss functions in Practice. In the second part, I will explain and show implementation of more loss function examples for other types of problems in Machine Learning.




Comments

Popular posts from this blog

Difference between a Singly LinkedList and Doubly LinkedList

DFS Performance Measurement

Completeness DFS is not complete, to convince yourself consider that our search start expanding the left subtree of the root for so long path (maybe infinite) when different choice near the root could lead to a solution, now suppose that the left subtree of the root has no solution, and it is unbounded, then the search will continue going deep infinitely, in this case , we say that DFS is not complete. Optimality  Consider the scenario that there is more than one goal node, and our search decided to first expand the left subtree of the root where there is a solution at a very deep level of this left subtree , in the same time the right subtree of the root has a solution near the root, here comes the non-optimality of DFS that it is not guaranteed that the first goal to find is the optimal one, so we conclude that DFS is not optimal. Time Complexity Consider a state space that is identical to that of BFS, with branching factor b, and we start the search fro...

A Brief Overview of GPT-3 by OpenAI

    You have probably already seen some articles like "A robot wrote an entire article. Aren't you scared yet, human?" So, who is the robot here?    It's GPT-3 model. It's a transformer based language model. The full form of GPT is Generative Pre-trained Transformers. This model is developed by OpenAI. There were GPT-2 and other models released by OpenAI previously. GPT-3 was released in May 2020. GPT-3 is more robust than its predecessors. Though architecturally it doesn't have that mush difference.   GPT-3 can write articles, poems, and even working code for you*, given some context. There are some limitations which I am going explain later in this article. It's a language model means given a text, it probabilistically predicts what tokens from a known vocabulary will come next in that string. So, it's sort of a autocomplete that we see on a phone keyboard. We type a word, and then the keyboard suggests another word that can come next. What sets GPT...