Learn how to implement logistic regression in PyTorch and boost your ML skills

Hussain Wali
7 min readJun 19, 2023

What are Tensors?
Tensors are multi-dimensional arrays with a uniform type (called a dtype). They are mathematical objects that generalize scalars, vectors, and matrices to higher dimensions. They are a key concept in fields such as physics and engineering, and more recently, in machine learning and data science, particularly in the context of deep learning frameworks like TensorFlow and PyTorch.

Here’s a simple way to understand tensors:

- A scalar, such as a single number like 7, can be considered a rank-0 tensor.
- A vector, which is an ordered array of numbers like [1, 2, 3], can be considered a rank-1 tensor.
- A matrix, which is a 2D grid of numbers, can be considered a rank-2 tensor.
- Tensors can have rank 3, 4, or more. For example, a rank-3 tensor could be visualized as a cube of numbers.

In the context of machine learning, tensors are used to encode the inputs, outputs, and transformations within neural networks. For example, an image might be represented as a rank-3 tensor, with dimensions corresponding to the height, width, and color channels of the image. Tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one.

Tensors are a fundamental data structure in PyTorch. They are a generalization of vectors and matrices to higher dimensions. Tensors can be used to represent any type of data, including images, audio, and text.

Here is an example of a tensor:

import torch
tensor = torch.tensor([1, 2, 3, 4])
print(tensor)

This code will print the following output:

tensor(1, 2, 3, 4)

As you can see, the tensor has 4 elements, and each element is a number.

What is data preprocessing?

Data preprocessing is the process of preparing data for analysis. This includes cleaning the data, formatting it, and transforming it into a format that is suitable for the analysis task at hand.

Data preprocessing is an important step in the machine learning process. It can help to improve the accuracy and efficiency of machine learning models.

Why do we need data preprocessing?

There are a number of reasons why we need data preprocessing. These include:

  • To remove noise from the data. Noise is any unwanted data that can interfere with the analysis process. For example, if there are missing values in the data, these values will need to be removed before the data can be analyzed.
  • To format the data. The data may need to be formatted in a specific way before it can be used by machine learning algorithms. For example, the data may need to be converted into a numerical format or a categorical format.
  • To transform the data. The data may need to be transformed in a way that makes it more informative for machine learning algorithms. For example, the data may need to be normalized or standardized to reduce the impact of outliers.

Difference between PyTorch Tensors and Numpy array

PyTorch tensors and NumPy arrays are both data structures that can be used to represent data. However, there are some key differences between the two data structures.

  • PyTorch tensors are more flexible than NumPy arrays. PyTorch tensors can be used to represent data of any dimension, while NumPy arrays are limited to 2-dimensional data.
  • PyTorch tensors are more efficient than NumPy arrays. PyTorch tensors are optimized for use with machine learning algorithms, while NumPy arrays are not.
  • PyTorch tensors are easier to use with GPU acceleration. PyTorch tensors can be easily used with GPU acceleration, while NumPy arrays cannot.

What are categorical and numerical features?

Categorical features are features that can be divided into categories, such as gender or race. Numerical features are features that can be represented as numbers, such as height or weight.

What is data sampling?

Data sampling is the process of selecting a subset of data from a larger dataset. Data sampling can be used to improve the performance of machine learning models by reducing the amount of data that needs to be processed.

There are two main types of data sampling:

  • Random sampling: This is the simplest type of data sampling. A random sample is a subset of data that is selected randomly from the larger dataset.
  • Stratified sampling: This type of data sampling ensures that the subset of data is representative of the larger dataset. This is done by dividing the larger dataset into strata, and then randomly sampling from each stratum.

What is upsampling and downsampling?

Upsampling and downsampling are two techniques that can be used to change the size of a dataset. Upsampling is the process of increasing the size of a dataset, while downsampling is the process of decreasing the size of a dataset.

Upsampling can be used to improve the performance of machine learning models by increasing the amount of data that is available. Downsampling can be used to improve the performance of machine learning models by reducing the amount of noise in the data.

How to do label encoding of categorical features?

Label encoding is a technique that can be used to convert categorical features into numerical features. This is done by assigning a unique number to each category.

For example, if you have a categorical feature with three categories, you could assign the number 0 to the first category, the number 1 to the second category, and the number 2 to the third category.

How to scale numerical features?

Scaling numerical features is a technique that can be used to normalize the data. This is done by subtracting the mean from each feature and then dividing by the standard deviation.

Scaling the data can help to improve the performance of machine learning models by making the data more consistent.

What is Logistic Regression?

Logistic regression is a type of machine learning algorithm that can be used for classification problems. It is a statistical model that predicts the probability of an event.

Logistic regression is a linear model, which means that it predicts the probability of an event as a linear function of the features. However, the output of the linear function is passed through a sigmoid function, which transforms the output into a probability.

The sigmoid function is a non-linear function that takes a real number as input and outputs a number between 0 and 1. The sigmoid function is often used in machine learning to transform the output of a linear model into a probability.

Logistic regression is a powerful tool for classification problems. It is relatively easy to understand and implement, and it can be very accurate.

Types of Logistic Regression

There are two main types of logistic regression:

  • Binary logistic regression: This is the most common type of logistic regression. It is used to predict the probability of a binary event, such as whether a customer will click on an ad or not.
  • Multinomial logistic regression: This type of logistic regression is used to predict the probability of a multinomial event, such as which of three products a customer will buy.

What is a sigmoid function?

The sigmoid function is a non-linear function that takes a real number as input and outputs a number between 0 and 1. The sigmoid function is often used in machine learning to transform the output of a linear model into a probability.

The sigmoid function is defined as:

f(x) = 1 / (1 + e^(-x))

where x is a real number.

The sigmoid function is a S-shaped curve that has a value of 0 when x is negative, a value of 0.5 when x is 0, and a value of 1 when x is positive.

What is model optimization?

Model optimization is the process of finding the best parameters for a machine learning model. This is done by iteratively adjusting the parameters of the model until the model achieves the desired performance.

There are a number of different optimization algorithms that can be used for model optimization. These algorithms include gradient descent, stochastic gradient descent, and Adam.

What is an ADAM optimizer?

ADAM is a popular optimization algorithm that is used for training machine learning models. ADAM stands for Adaptive Moment Estimation.

ADAM is a stochastic gradient descent algorithm that uses adaptive learning rates for each parameter of the model. This means that the learning rate for each parameter is adjusted based on the history of the parameter's gradients.

ADAM is a very effective optimization algorithm that can be used to train machine learning models quickly and efficiently.

Implementation of logistic regression in PyTorch using all the aforementioned methods.

Here is an example of how to implement logistic regression in PyTorch using all the aforementioned methods:

import torch
import torch.nn as nn
import torch.optim as optim

class LogisticRegression(nn.Module):
def __init__(self, num_features):
super(LogisticRegression, self).__init__()
self.linear = nn.Linear(num_features, 1)

def forward(self, x):
x = self.linear(x)
x = torch.sigmoid(x)
return x

def train(model, data, labels, epochs):
optimizer = optim.Adam(model.parameters())
for epoch in range(epochs):
loss = 0
for x, y in zip(data, labels):
y_pred = model(x)
loss += (y_pred - y).pow(2).sum()
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(loss)

if __name__ == '__main__':
data = torch.randn(100, 10)
labels = torch.randint(0, 2, (100,))
model = LogisticRegression(10)
train(model, data, labels, 10)

This code will train a logistic regression model on a dataset of 100 features. The model will be trained for 10 epochs. The output of the code will be the loss of the model after the last epoch.

I hope this article has been helpful. Please let me know if you have any questions.

--

--

Hussain Wali

Software Engineer by profession. Data Scientist by heart. MS Data Science at National University of Science and Technology Islamabad.