Basic concepts and terms on how machine learns

Understanding the mechanics of machine learning is essential not only for data scientists but also for product managers who oversee the development and implementation of these technologies. This article aims to demystify how machines learn through neural networks, providing product managers with a foundational understanding.

Machines learn through neural networks. There are different neural networks to solve different problems but they all stem from the basic artificial neural network.

The way the basic Neural Network works is within 3 layers. Input layer, hidden layer and output layer.

Input Layer: This is the input data, weights and biases of each input. These are defined by the machine.

Hidden Layer: This layer holds the activation function. We need this activation function to prevent linearity. Without this function all we would be doing is getting the weighted sum (+ bias). Depending on the problem that we are trying to solve, we use different functions. These functions are defined by humans (us).

ReLu: (Rectified Linear Unit): This activation function is used on both solving regression and classification problems. What this does is, if the input is less than 0, output is automatically 0. Keeps the output always positive. It is used in the hidden layer to help in learning non linearities and speed up the training process.

Sigmoid: This output is mostly used in classification problems. It outputs a value between 0 and 1 so this activation function is usually used for Binary Classification. (It is useful for models where the output needs to be interpreted as a probability.)

Hyperbolic Tangent: This activation function is similar to Sigmoid function but it outputs a value between -1 and 1. It is not as common as Sigmoid functions but can also be used for Binary Classification.

Softmax: This activation function is mostly used for Multi Class Classification problems. It outputs values between 0 and 1 and makes it interpretable as probabilities.

Once we have the output , we can compare against the actual value. In order to compare the predicted value vs the actual value we use loss or cost function. Loss function is calculated to find the errors on single training , Cost function is used to calculate errors for the entire training set. If the difference is significant, the neural network knows to go back , learn more and adjust parameters.

For Regression Problems; we can use (MSE) Mean square Error as a custom function. (Avg. sum squared between the predicted and actual value). Please note that MSE is sensitive to outliers; in certain cases Mean Absolute Error (MAE) might be considered alongside or as an alternative to MSE.

For classification Problems; we can use Cross-Entropy which measures the performance of a classification model whose output is a probability value between 0 and 1. It quantifies the difference between two probability distributions (predicted vs actual)

When the cost function (the difference between the predicted and actual) is high, we go back and adjust the weights. This process is known as backpropagation.

Adjusting the weights is done by a method called Gradient Descent. Gradient descent refers to the process of walking down the surface formed by the cost function and finding the bottom. (It’s complicated.) We set direction to find the bottom. And the step size (learning rate).

Direction: Set a direction (Left to right & Right to Left): Calculate the derivative of the cost function and find it’s negative. At this point the angle of the slope is negative and we are at the left side of the curve. To get to the bottom, we must go down and right. When we do this again we are on the right side of the curve, and we calculate the derivative again. At this point the angle of the slope is positive and we must continue to the left side.We calculate the derivative of the cost function every time to decide which direction to go.Repeat the process until we reach the bottom.

Size: Set a learning rate hyperparameter (Step size). When we set a learning rate, we are deciding the rate of learning speed. How fast we want to get to the bottom. If we set the learning rate too small it might take forever to train the model. If we set it too high, we might end up leaving the curve without finding the bottom. If the step size is too small, your training might take too long.

The most important part here is that we (humans) set the hyperparameter of step size (Learning Rate). The weights and biases are parameters, which are learned by the machine during training. We have no control over these parameters except to set the initial values

Input + activation function + output + actual + cost function + backpropagation is an iterative process. The number of iterations are called epochs. This is also important as epochs are hyperparameters that are set by humans(us) before training.

Learning Rate : How fast machine learns

Epochs: How many times the learning iterates.

In practice; we choose the hyperparameters in the experimentation workstream to find the optimum combination. If you choose a google tool like AutoML, it automatically selects the hyperparameters for you which saves a ton of time from experimentation.

In summary, machine learning through neural networks, involves a complex but structured process of learning from data. Product managers, while not directly involved in the coding or mathematical details, need to understand the essential components such as the input, hidden, and output layers, activation functions, and the iterative nature of learning through backpropagation and epochs. This knowledge enables product managers to make informed decisions on feature implementations and collaborate seamlessly with Data Science and Machine Learning teams to enhance product outcomes.

Linkedin: https://www.linkedin.com/pulse/basic-concepts-terms-how-machine-learns-neural-neal-akyildirim-fmk9e/

Basic concepts and terms on how machine learns - Neural Networks