The Principles of Deep Learning Theory

submited by

Style Pass

2021-06-21 16:00:08

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

In Theory: is 4K DLSS really viable for a next-gen Switch Pro?

Comment

Evolutionary Deep Intelligence

Comment

rentruewang / learning-machine

Comment

The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near

Comment

Sherd Alert: GPU-Accelerated Deep Learning Sorts Pottery Fragments as Well as Expert Archeologists

Comment

f-dangel / cockpit

Comment

The unexpected benefits of mentoring others

Comment

What the Heck is a Data Mesh?!

Comment

These great apes share salutations — just like humans

Comment

This mathematical brain model may pave the way for more human-like AI

Comment

The Principles of Deep Learning Theory

Leave a Comment

Related Posts

In Theory: is 4K DLSS really viable for a next-gen Switch Pro?

Evolutionary Deep Intelligence

rentruewang / learning-machine

The Brain vs Deep Learning Part I: Computational Complexity — Or Why the Singularity Is Nowhere Near

Sherd Alert: GPU-Accelerated Deep Learning Sorts Pottery Fragments as Well as Expert Archeologists

f-dangel / cockpit

The unexpected benefits of mentoring others

What the Heck is a Data Mesh?!

These great apes share salutations — just like humans

This mathematical brain model may pave the way for more human-like AI

Recent Posts

Signs of multiple sclerosis show up in blood years before symptoms, study finds

Unraveling water mysteries beyond Earth: Ground-penetrating radar will seek bodies of water on Jupiter

Cosmic rays streamed through Earth's atmosphere 41,000 years ago: New findings on the Laschamps excursion

Prevalence of polycystic ovaries and polycystic ovary syndrome in lesbian women compared with heterosexual women

Building a Peace Narrative

Ghost particle on the scales: Research offers more precise determination of neutrino mass

UAW secures historic union election win at Tennessee Volkswagen plant

Disney ‘Star Wars’ Box-Office Profits Fail To Cover Cost Of Buying Lucasfilm

The legendary Zilog Z80 CPU is being discontinued after nearly 50 years

Microsoft and OpenAI bet $100 billion to free themselves from the shackles and overreliance on the world's most profitable semiconductor chip brand for AI chips

The Story of the Soviet Z80 Processor

Magnetically tunable supercurrent in dilute magnetic topological insulator-based Josephson junctions

Why Dolphin Isn’t Coming to the App Store

Internet Archive Stands Firm on Library Digital Rights in Final Brief of Hachette v. Internet Archive Lawsuit

Study suggests Io's volcanoes have been active for 4.5 billion years

Institutional code and human behavior (2)

GIFs are a flat circle

Curtains on the 2024 Centerstage robotics season (+ Engineering Portfolio release)

Ocean spray emits more PFAS than industrial polluters, study finds

Mandisa, ‘American Idol’ Star and Grammy-Winning Christian Singer, Dies at 47