Introduction

Neural networks are a class of machine learning algorithms inspired by the structure and function of the human brain. They try to mimic the way the human brain works by using artificial neurons to process information.

Biological vs. Artificial Neurons

A biological neuron consists of:

  • a cell body which contains the nucleus
  • dendrites which receive signals from other neurons
  • axons which transmit signals to other neurons

The simplified mathematical model of a neuron (the artifical neuron) is structured similarly. It takes inputs (usually a number), processes them, and produces an output (also a number).

Terminology

  • Input Layer: The first layer of the neural network which receives the input data.
  • Hidden Layer: Layers between the input and output layers.
  • Output Layer: The final layer of the neural network which produces the output.
  • Activation: The output of a neuron.
  • Activation Function: A function that introduces non-linearity into the output of a neuron.
  • Weights: Parameters that the neural network learns during training.
  • Bias: An additional parameter that allows the activation function to be shifted.

Variable Naming:

  • $g$: Sigmoid (activation) function
  • $w_{n}$: Vektor of weights of the $n$-th neuron in a layer
  • $b_{n}$: Bias of the $n$-th neuron in a layer
  • $w_{n}^{[l]}$: Vektor of weights of the $n$-th neuron in the $l$-th layer
  • $a_{n}^{[l]}$: Activation (output) of the $n$-th neuron in the $l$-th layer
  • $a^{[l]}$: Vector of activations (outputs) of the $l$-th layer (input of the $l+1$-th layer)

The general equation for the activation of a neuron is: $$a_{j}^{[l]} = g(w_{j}^{[l]} \cdot a^{[l-1]} + b_{j}^{[l]})$$

Forward Propagation

Forward propagation is the process of computing the output of a neural network for a given input. It involves passing the input through the network, applying the weights and biases, and activating the neurons.

Backward Propagation

Backward propagation is the process of updating the weights and biases of a neural network to minimize the cost function. It involves computing the gradients of the cost function with respect to the weights and biases, and using them to update the parameters.

Implementation

To create a model with TensorFlow, you can use the following code snippet:

  import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import BinaryCrossentropy
from tensorflow.keras.losses import MeanSquaredError

# Create the model
model = Sequential([
    Dense(25, activation='sigmoid'),
    Dense(15, activation='sigmoid'),
    Dense(1, activation='sigmoid'),
])

# Loss and cost function
model.compile(loss=BinaryCrossentropy()) # for binary classification
model.compile(loss=MeanSquaredError()) # for regression

# Train the model
model.fit(X_train, y_train, epochs=100)
  

Activation Functions

Activation functions introduce non-linearity into the output of a neuron. Some common activation functions are:

  • Linear: $$g(z) = z$$
  • Sigmoid: $$g(z) = \frac{1}{1 + e^{-z}}$$
  • ReLU (Rectified Linear Unit): $$g(z) = \max(0, z)$$

The ReLU function is the most commonly used activation function for hidden layers. This is because it is computationally efficient and avoids the vanishing gradient problem.

For the output layer, the activation function depends on the task:

  • Binary Classification: Sigmoid
  • Multi-Class Classification: Softmax
  • Regression: Linear

Logistic Regression

(2 possible output values)

Activations are calculated using the sigmoid function:

$$ a_1 = g(z) = \frac{1}{1 + e^{-z}} = P(y=1|x)\newline a_2 = 1 - a_1 = P(y=0|x) $$

Probabilities add up to $1$.

Softmax Regression

(4 possible output values)

$$ a_1 = \frac{e^{z_1}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=1|x)\newline a_2 = \frac{e^{z_2}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=2|x)\newline a_3 = \frac{e^{z_3}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=3|x)\newline a_4 = \frac{e^{z_4}}{e^{z_1}+e^{z_2}+e^{z_3}+e^{z_4}} = P(y=4|x) $$

Probabilities also add up to $1$.

  import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import SparseCategoricalCrossentropy

model = Sequential([
    Dense(25, activation='relu'),
    Dense(15, activation='relu'),
    Dense(10, activation='softmax'),
])

model.compile(loss=SparseCategoricalCrossentropy())

model.fit(X_train, y_train, epochs=100)
  

Multi-Class Classification vs. Multi-Label Classification

  • Multi-Class Classification: Each sample belongs to exactly one class.
  • Multi-Label Classification: Each sample can belong to multiple classes.

Diagnosis

Diagnosing the performance of a neural network involves looking at the bias and variance of the model. Depending on the diagnosis, differnt steps can be taken to improve the model.

  • High Bias (Underfit): The model is too simple and does not fit the training data well.
  • High Variance (Overfit): The model is too complex and fits the training data too well.

Baseline

Establishing a baseline is required to estimated the expected performance of the model.

The baseline can be:

  • Human level performance
  • Performance of a competing algorithm
  • Guessed based on experience

Comparison

Depending on which subset of data the model performs well on, the following can be concluded:

  • High bias: $J_{train}$ is high ($J_{cv} \approx J_{train}$)
  • High variance: $J_{cv} » J_{train}$ ($J_{train}$ may be low)
  • High bias and high variance: $J_{cv} » J_{train}$ and $J_{train}$ will be high

Improving

Depending on the diagnosed problem, different steps can be taken.

Fix High Bias

  • Add polynomial featrues
  • Bigger set of features
  • Decrease regularization

Fix High Variance

  • Get more training examples
  • Smaller set of features
  • Increase regularization

Error Analysis

Error analysis is the process of manually examining the errors made by the model. This can help to identify patterns in the errors and improve the model.

If there are too many misclassifications to analyze, pick a random subset of about 100 examples.

Process

Neural networks are low bias machines by nature.

flowchart TD
    start[Start] --> build[Build model]
    build --> train[Train model]
    train --> validateTrain{Does well on training set?}
    validateTrain -->|Yes| validateCross{Does well on cross validation set?}
    validateTrain -->|No| bigger[Bigger network]
    bigger --> train
    validateCross -->|Yes| done[Done]
    validateCross -->|No| more[More training data]
    more --> train