05 Neural Network
Introduction
Neural networks are a class of machine learning algorithms inspired by the structure and function of the human brain. They try to mimic the way the human brain works by using artificial neurons to process information.
Biological vs. Artificial Neurons
A biological neuron consists of:
- a cell body which contains the nucleus
- dendrites which receive signals from other neurons
- axons which transmit signals to other neurons
The simplified mathematical model of a neuron (the artifical neuron) is structured similarly. It takes inputs (usually a number), processes them, and produces an output (also a number).
Terminology
- Input Layer: The first layer of the neural network which receives the input data.
- Hidden Layer: Layers between the input and output layers.
- Output Layer: The final layer of the neural network which produces the output.
- Activation: The output of a neuron.
- Activation Function: A function that introduces non-linearity into the output of a neuron.
- Weights: Parameters that the neural network learns during training.
- Bias: An additional parameter that allows the activation function to be shifted.
Variable Naming:
- $g$: Sigmoid (activation) function
- $w_{n}$: Vektor of weights of the $n$-th neuron in a layer
- $b_{n}$: Bias of the $n$-th neuron in a layer
- $w_{n}^{[l]}$: Vektor of weights of the $n$-th neuron in the $l$-th layer
- $a_{n}^{[l]}$: Activation (output) of the $n$-th neuron in the $l$-th layer
- $a^{[l]}$: Vector of activations (outputs) of the $l$-th layer (input of the $l+1$-th layer)
The general equation for the activation of a neuron is: $$a_{j}^{[l]} = g(w_{j}^{[l]} \cdot a^{[l-1]} + b_{j}^{[l]})$$
Forward Propagation
Forward propagation is the process of computing the output of a neural network for a given input. It involves passing the input through the network, applying the weights and biases, and activating the neurons.
Backward Propagation
Backward propagation is the process of updating the weights and biases of a neural network to minimize the cost function. It involves computing the gradients of the cost function with respect to the weights and biases, and using them to update the parameters.
Implementation
To create a model with TensorFlow, you can use the following code snippet:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.losses import MeanSquaredError
# Create the model
model = Sequential([
Dense(25, activation='sigmoid'),
Dense(15, activation='sigmoid'),
Dense(1, activation='sigmoid'),
])
# Loss and cost function
model.compile(loss=BinaryCrossentropy()) # for binary classification
model.compile(loss=MeanSquaredError()) # for regression
# Train the model
model.fit(X_train, y_train, epochs=100)
Activation Functions
Activation functions introduce non-linearity into the output of a neuron. Some common activation functions are:
- Linear: $$g(z) = z$$
- Sigmoid: $$g(z) = \frac{1}{1 + e^{-z}}$$
- ReLU (Rectified Linear Unit): $$g(z) = \max(0, z)$$
The ReLU function is the most commonly used activation function for hidden layers. This is because it is computationally efficient and avoids the vanishing gradient problem.
For the output layer, the activation function depends on the task:
- Binary Classification: Sigmoid
- Multi-Class Classification: Softmax
- Regression: Linear
Logistic Regression
(2 possible output values)
Activations are calculated using the sigmoid function:
$$ a_1 = g(z) = \frac{1}{1 + e^{-z}} = P(y=1|x)\newline a_2 = 1 - a_1 = P(y=0|x) $$
Probabilities add up to $1$.
Softmax Regression
(4 possible output values)
$$ a_1 = \frac{e^{z_1}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=1|x)\newline a_2 = \frac{e^{z_2}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=2|x)\newline a_3 = \frac{e^{z_3}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=3|x)\newline a_4 = \frac{e^{z_4}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=4|x) $$
Probabilities also add up to $1$.
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import SparseCategoricalCrossentropy
model = Sequential([
Dense(25, activation='relu'),
Dense(15, activation='relu'),
Dense(10, activation='softmax'),
])
model.compile(loss=SparseCategoricalCrossentropy())
model.fit(X_train, y_train, epochs=100)
Multi-Class Classification vs. Multi-Label Classification
- Multi-Class Classification: Each sample belongs to exactly one class.
- Multi-Label Classification: Each sample can belong to multiple classes.
Diagnosis
Diagnosing the performance of a neural network involves looking at the bias and variance of the model. Depending on the diagnosis, differnt steps can be taken to improve the model.
- High Bias (Underfit): The model is too simple and does not fit the training data well.
- High Variance (Overfit): The model is too complex and fits the training data too well.
Baseline
Establishing a baseline is required to estimated the expected performance of the model.
The baseline can be:
- Human level performance
- Performance of a competing algorithm
- Guessed based on experience
Comparison
Depending on which subset of data the model performs well on, the following can be concluded:
- High bias: $J_{train}$ is high ($J_{cv} \approx J_{train}$)
- High variance: $J_{cv} » J_{train}$ ($J_{train}$ may be low)
- High bias and high variance: $J_{cv} » J_{train}$ and $J_{train}$ will be high
Improving
Depending on the diagnosed problem, different steps can be taken.
Fix High Bias
- Add polynomial featrues
- Bigger set of features
- Decrease regularization
Fix High Variance
- Get more training examples
- Smaller set of features
- Increase regularization
Note: Regularization is a technique to prevent overfitting by adding a penalty term to the cost function.
Choosing a high value for the regularization parameter $\lambda$ will lead to a high bias (underfitting).
Choosing a low value for the regularization parameter $\lambda$ will lead to a high variance (overfitting).
Error Analysis
Error analysis is the process of manually examining the errors made by the model. This can help to identify patterns in the errors and improve the model.
If there are too many misclassifications to analyze, pick a random subset of about 100 examples.
Process
Neural networks are low bias machines by nature.
flowchart TD
start[Start] --> build[Build model]
build --> train[Train model]
train --> validateTrain{Does well on training set?}
validateTrain -->|Yes| validateCross{Does well on cross validation set?}
validateTrain -->|No| bigger[Bigger network]
bigger --> train
validateCross -->|Yes| done[Done]
validateCross -->|No| more[More training data]
more --> train