Grokking Modular Arithmetic Through Group Actions: A Group-Theoretic View of Machine Learning Behavior

Tomàs Bernal, Marcel; Tomàs Bernal, Marcel

Grokking Modular Arithmetic Through Group Actions: A Group-Theoretic View of Machine Learning Behavior

Author

Tomàs Bernal, Marcel

Other authors

Universitat Politècnica de Catalunya. Departament de Ciències de la Computació

University of California, San Diego

Belkin, Mikhail

Publication date

2025-05-28

Abstract

Aquesta tesi explora el fenomen del grokking—una generalització retardada després dentrenar un model fins al punt d'interpolació—en xarxes neuronals i en Recursive Feature Machines (RFM) utilitzades amb màquines kernel entrenades en tasques d’aritmètica modular. Les RFM funcionen actualitzant iterativament les matrius de característiques mitjançant el Average Gradient Outer Product (AGOP) d’un estimador. Demostrem que la clau de la generalització rau en l’emergència d’estructura algebraica en les característiques apreses. En concret, mostrem que els models generalitzen quan recuperen les accions de grup invariants subjacents en les dades. Interpretant aquestes estructures a través del prisma de la teoria de grups, podem construir particions de dades que inhibeixen la generalització i connectar el comportament de grokking amb la recuperació de simetria i estructura en les dades. Generalitzem aquests resultats al problema de composició de grups per a grups finits abelians.

Esta tesis explora el fenómeno del grokking—una generalización tardía después de entrenar un modelo hasta el punto de interpolación—en redes neuronales y en Recursive Feature Machines (RFM), utilizadas junto con máquinas kernel entrenadas en tareas de aritmética modular. Las RFM operan actualizando iterativamente matrices de características mediante el Average Gradient Outer Product (AGOP) de un estimador. Demostramos que la clave para la generalización reside en la aparición de estructura algebraica en las características aprendidas. Específicamente, mostramos que los modelos generalizan cuando recuperan las acciones de grupo invariantes subyacentes en los datos. Al interpretar estas estructuras a través de la perspectiva de la teoría de grupos, somos capaces de construir particiones de datos que inhiben la generalización y conectar el comportamiento de grokking con la recuperación de simetría y estructura en los datos. Generalizamos estos resultados al problema de composición de grupos para grupos finitos abelianos.

This thesis explores the phenomenon of grokking—delayed generalization after overfitting—in neural networks and Recursive Feature Machines (RFM) used in conjunction with kernel machines trained on modular arithmetic tasks. RFM operates by iteratively updating feature matrices through the Average Gradient Outer Product (AGOP) of an estimator. We demonstrate that the key to generalization lies in the emergence of algebraic structure in the learned features. Specifically, we show that models generalize when they recover the underlying invariant group actions inherent in the data. By interpreting these learned structures through the lens of group theory, we are able to construct data partitions that inhibit generalization and connect grokking behavior to the recovery of symmetry and structure in the data. We further generalize these results to the group composition problem for abelian finite groups.

Outgoing

Document Type

Bachelor thesis

Language

English

Subjects and keywords

Àrees temàtiques de la UPC::Física; Machine learning; Mathematical optimization; Group theory; Grokking; Group Theory; Group Actions; Machine Learning; Feature Learning; Aprenentatge automàtic; Optimització matemàtica; Grups, Teoria de; Classificació AMS::68 Computer science::68T Artificial intelligence; Classificació AMS::20 Group theory and generalizations::20C Representation theory of groups

Publisher

Universitat Politècnica de Catalunya

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

http://creativecommons.org/licenses/by-nc/4.0/

Open Access

Attribution-NonCommercial 4.0 International

This item appears in the following Collection(s)

Treballs acadèmics [82541]

Grokking Modular Arithmetic Through Group Actions: A Group-Theoretic View of Machine Learning Behavior

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Publisher

Recommended citation

Export

Rights

This item appears in the following Collection(s)