Grokking Modular Arithmetic Through Group Actions: A Group-Theoretic View of Machine Learning Behavior

Tomàs Bernal, Marcel

Grokking Modular Arithmetic Through Group Actions: A Group-Theoretic View of Machine Learning Behavior

dc.contributor

Universitat Politècnica de Catalunya. Departament de Ciències de la Computació

dc.contributor

University of California, San Diego

dc.contributor

Belkin, Mikhail

dc.contributor.author

Tomàs Bernal, Marcel

dc.date.issued

2025-05-28

dc.identifier

https://hdl.handle.net/2117/445511

dc.identifier

PRISMA-192165

dc.description.abstract

Aquesta tesi explora el fenomen del grokking—una generalització retardada després dentrenar un model fins al punt d'interpolació—en xarxes neuronals i en Recursive Feature Machines (RFM) utilitzades amb màquines kernel entrenades en tasques d’aritmètica modular. Les RFM funcionen actualitzant iterativament les matrius de característiques mitjançant el Average Gradient Outer Product (AGOP) d’un estimador. Demostrem que la clau de la generalització rau en l’emergència d’estructura algebraica en les característiques apreses. En concret, mostrem que els models generalitzen quan recuperen les accions de grup invariants subjacents en les dades. Interpretant aquestes estructures a través del prisma de la teoria de grups, podem construir particions de dades que inhibeixen la generalització i connectar el comportament de grokking amb la recuperació de simetria i estructura en les dades. Generalitzem aquests resultats al problema de composició de grups per a grups finits abelians.

dc.description.abstract

Esta tesis explora el fenómeno del grokking—una generalización tardía después de entrenar un modelo hasta el punto de interpolación—en redes neuronales y en Recursive Feature Machines (RFM), utilizadas junto con máquinas kernel entrenadas en tareas de aritmética modular. Las RFM operan actualizando iterativamente matrices de características mediante el Average Gradient Outer Product (AGOP) de un estimador. Demostramos que la clave para la generalización reside en la aparición de estructura algebraica en las características aprendidas. Específicamente, mostramos que los modelos generalizan cuando recuperan las acciones de grupo invariantes subyacentes en los datos. Al interpretar estas estructuras a través de la perspectiva de la teoría de grupos, somos capaces de construir particiones de datos que inhiben la generalización y conectar el comportamiento de grokking con la recuperación de simetría y estructura en los datos. Generalizamos estos resultados al problema de composición de grupos para grupos finitos abelianos.

dc.description.abstract

This thesis explores the phenomenon of grokking—delayed generalization after overfitting—in neural networks and Recursive Feature Machines (RFM) used in conjunction with kernel machines trained on modular arithmetic tasks. RFM operates by iteratively updating feature matrices through the Average Gradient Outer Product (AGOP) of an estimator. We demonstrate that the key to generalization lies in the emergence of algebraic structure in the learned features. Specifically, we show that models generalize when they recover the underlying invariant group actions inherent in the data. By interpreting these learned structures through the lens of group theory, we are able to construct data partitions that inhibit generalization and connect grokking behavior to the recovery of symmetry and structure in the data. We further generalize these results to the group composition problem for abelian finite groups.

dc.description.abstract

Outgoing

dc.format

application/pdf

dc.language

eng

dc.publisher

Universitat Politècnica de Catalunya

dc.rights

http://creativecommons.org/licenses/by-nc/4.0/

dc.rights

Open Access

dc.rights

Attribution-NonCommercial 4.0 International

dc.subject

Àrees temàtiques de la UPC::Física

dc.subject

Machine learning

dc.subject

Mathematical optimization

dc.subject

Group theory

dc.subject

Grokking

dc.subject

Group Theory

dc.subject

Group Actions

dc.subject

Machine Learning

dc.subject

Feature Learning

dc.subject

Aprenentatge automàtic

dc.subject

Optimització matemàtica

dc.subject

Grups, Teoria de

dc.subject

Classificació AMS::68 Computer science::68T Artificial intelligence

dc.subject

Classificació AMS::20 Group theory and generalizations::20C Representation theory of groups

dc.title

Grokking Modular Arithmetic Through Group Actions: A Group-Theoretic View of Machine Learning Behavior

dc.type

Bachelor thesis

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Treballs acadèmics [82539]