Hierarchical Organization: Features organize hierarchically across layers - earlier layers capture low-level (small context) features while deeper layers represent increasingly abstract (large context) concepts.
Superposition Hypothesis: Neural nets represent more “independent” features than a layer has neurons (dimensions) by representing features as a linear combination of neurons.
Adversarial Vulnerability: Small changes in input space can cause large shifts in embeddings and therefore also in predictions made from them, suggesting the learned manifolds have irregular geometric properties.
Neural Collapse: After extensive training, class features in the final layer cluster tightly around their means, with the network's classification weights aligning with these mean directions. Within-class variation becomes minimal compared to between-class differences, effectively creating distinct, well-separated clusters for each class.