Автор: Lei Huang
Издательство: Springer
Год: 2022
Страниц: 117
Язык: английский
Формат: pdf (true), epub
Размер: 13.5 MB
This book presents and surveys normalization techniques with a deep analysis in training deep neural networks. In addition, the author provides technical details in designing new normalization methods and network architectures tailored to specific tasks. Normalization methods can improve the training stability, optimization efficiency, and generalization ability of deep neural networks (DNNs) and have become basic components in most state-of-the-art DNN architectures. The author provides guidelines for elaborating, understanding, and applying normalization methods. This book is ideal for readers working on the development of novel deep learning algorithms and/or their applications to solve practical problems in computer vision and machine learning tasks. The book also serves as a resource researchers, engineers, and students who are new to the field and need to understand and train DNNs.
Deep neural networks (DNNs) have been extensively used across a broad range of applications, including computer vision (CV), natural language processing (NLP), speech and audio processing, robotics, bioinformatics, etc. They are typically composed of stacked layers/modules, the transformation between which consists of a linear mapping with learnable parameters and a nonlinear activation function. While their deep and complex structure provides them powerful representation capacity and appealing advantages in learning feature hierarchies, it also makes their training difficult. One notorious problem in training DNNs is the so-called activations (and gradients) vanishing or exploding, which is mainly caused by the compounded linear or nonlinear transformation in DNNs.
In fact, the success of DNNs heavily depends on breakthroughs in training techniques, especially on controlling the distribution of activations by design, which has been witnessed by the history of deep learning. For example, Hinton and Salakhutdinov proposed layer-wise initialization that pioneers the research on good initialization methods for linear layers, aiming to implicitly designing well shaped activations/gradients during initialization. This makes training deep models possible. One milestone technique in addressing the training issues of DNNs was batch normalization (BN), which explicitly standardizes the activations of intermediate DNN layers within a mini-batch of data. BN improves DNNs’ training stability, optimization efficiency and generalization ability. It is a basic component in most state-of-the-art architectures, and has successfully proliferated throughout various areas of Deep Learning. By now, BN is used by default in most deep learning models, both in research (more than 34,000 citations on Google scholar) and real-world settings. Further, a significant number of other normalization techniques have been proposed to address the training issues in particular contexts, further evolving the DNN architectures and their applications. For example, layer normalization (LN) is an essential module in Transformer, which has advanced the state-of-the-art architectures for NLP, while spectral normalization is a basic component in the discriminator of generative adversarial networks (GANs). Importantly, the ability of most normalization techniques to stabilize and accelerate training has helped to simplify the process of designing network architectures—training is no longer the main concern, enabling more focus to be given to developing components that can effectively encode prior/domain knowledge into the architectures.
Скачать Normalization Techniques in Deep Learning