SimBa Simplicity Bias for Scaling Parameters in Deep Reinforcement Learning

submited by
Style Pass
2024-10-28 11:30:03

We introduce SimBa , an architecture designed to inject simplicity bias for scaling up the parameters in deep RL. Simba consists of three components: (i) standardizing input observations with running statistics, (ii) incorporating residual feedforward blocks to provide a linear pathway from the input to the output, and (iii) applying layer normalization to control feature magnitudes. By scaling up parameters with SimBa, the sample efficiency of various deep RL algorithms—including off-policy, on-policy, and unsupervised methods —is consistently improved. Moreover, when SimBa is integrated into SAC, it matches or surpasses state-of-the-art deep RL methods with high computational efficiency across 51 tasks from DMC, MyoSuite, and HumanoidBench , solely by modifying the network architecture. These results demonstrate SimBa's broad applicability and effectiveness across diverse RL algorithms and environments.

SimBa comprises three components: Running Statistics Normalization, Residual Feedforward Blocks, and Post-Layer Normalization. These components lower the network's functional complexity, enhancing generalization for highly overparameterized configurations.

Leave a Comment
Related Posts