This is a question that is often posed by many machine-learning beginners. The answer is that there are only rules of thumb and it is a trial and error process to optimize your neural network for your specific application.

However, there are rules of thumb that may be helpful to get you started.

One hidden layer is sufficient for a large majority of problems
The optimal size of the hidden layer (i.e., number of neurons) is between the size of the input and the size of the output layer. A good start is to use the average of the total number of neurons in both the input and output layers.
A formula for the upper bound on the number of hidden neurons that does not result in overfitting is:

\begin{equation*} N_h = \frac{N_s}{\alpha*(N_i+N_o)} \end{equation*}

\(N_{i}\): Number of input neurons. \(N_{o}\) : Number of output neurons. \(N_{s}\) : Number of samples in training dataset. \(\alpha\) : Scaling factor from 2-10.

In an MLP, you want to minimize the number of free parameters in your model to a small proportion of the degrees of freedom (DoF) in your data. The DoF in the data is the number of samples * DoF (i.e., dimensions) in each sample (i.e., \(N_s*(N_i+N_o)\)). The \(\alpha\) indicates how general the model is designed to be. Start from an \(\alpha\) of 2 and work your way up to 10.

We will look further into the area of automated hyperparameter tuning for MLP models in another post.

Comments