How many neurons and layers for a multilayer perceptron (MLP)?

This is a question that is often posed by many machine-learning beginners. The answer is that there are only rules of thumb and it is a trial and error process to optimize your neural network for your specific application.

However, there are rules of thumb that may be helpful to get you started.

  • One hidden layer is sufficient for a large majority of problems

  • The optimal size of the hidden layer (i.e., number of neurons) is between the size of the input and the size of the output layer. A good start is to use the average of the total number of neurons in both the input and output layers.

  • A formula for the upper bound on the number of hidden neurons that does not result in overfitting is:

\begin{equation*} N_h = \frac{N_s}{\alpha*(N_i+N_o)} \end{equation*}

\(N_{i}\): Number of input neurons. \(N_{o}\) : Number of output neurons. \(N_{s}\) : Number of samples in training dataset. \(\alpha\) : Scaling factor from 2-10.

In an MLP, you want to minimize the number of free parameters in your model to a small proportion of the degrees of freedom (DoF) in your data. The DoF in the data is the number of samples * DoF (i.e., dimensions) in each sample (i.e., \(N_s*(N_i+N_o)\)). The \(\alpha\) indicates how general the model is designed to be. Start from an \(\alpha\) of 2 and work your way up to 10.

We will look further into the area of automated hyperparameter tuning for MLP models in another post.


Comments powered by Disqus