Neural networks are a kind of machine learning often conflated with deep learning: the distinction is that deep neural nets have two or more hidden layers.
Since their development, computers have been aimed at taking input and generating output for classification, and regression, and in general supervised and unsupervised learning.
Supervised learning is when you have pre-established/labeled data:
Imagine you have sensor data for servers with upload/download rates, temperature, humidity, etc, taken every ten minutes. Normally servers run fine, but sometimes parts fail, causing outages. We can collect data and divide it into classes for when the server is working (normal class), and for when it's in an outage (failure class).
What each sensor manages is called a feature.
A group of features is a feature-set, often represented as a vector.
The values of a feature set are a sample.
Samples are fed to neural nets to train them to fit desired outputs from the samples, or to make predictions during inference.
The normal/failure classes are classifications or labels: the goal is to predict when a failure is imminent.
In addition to classification there's regression, which predicts numerical values.
Unsupervised learning identifies structure in data without knowledge of labels.
NN's have been around since the 40's, but couldn't be trained until the 60's, when backpropagation was invented.
NN's are inspired by the brain, and while different, are highly similar, and when large numbers of neurons are combined, they tend to out perform other ML methods.
Dense layers of NN's are the most common, and within them, each neuron of a layer, is connected to every neuron of the next layer, so that its output becomes input. Each connection has a weight, a trainable factor, of how much of the input to use, which is multiplied by the input value.
The product of inputs and weights are fed to a neuron, summed, and then a bias is added, which offsets the output positively or negatively to map dynamic data.
Weights and biases are like knobs that can be tuned to fit models to data: NN's often have millions of these parameters tuned during training, and they affect neurons different.
We can actually graph neuron output, by way of mapping it onto y = mx + b, or output = weight(input) + bias, where changing the weight alters the function slope, and the value of the bias shift the overall function upward or downward.
We apply an activation function to the output, such as the Rectified Linear(ReLU), to the output.
Imagine a neural net with two hidden layers as well as an input and output layer.
The input layer takes data that is preprocessed via normalization and scaling (after being made numeric within the range 0..1 or -1..1), and the output layer is what the net returns. With classification the output layer typically has as many neurons as the data has classes.
Neural nets have millions of parameters (weight & bias pairs), and act as a function of millions of variables, for which their is an arrangement that produces correct results: finding this combination is the tricky part.
The end goal is to adjust parameters so that unseen examples trigger the desired output, and in pursuit of this one must avoid overfitting, where the net memorizes the data and doesn't perform well with new examples.
In-sample data is used to train the data, and out-of-sample data validates the algorithm, for example 90k of 100k samples may be used for training, and the remaining 10k may be used for validation.
When the algorithm performs correctly with out-of-sample data, it's referred to as generalization, which is a result of training (gradual parameter adjustment over time).