synopsis | E.T. : Empirical Theory in Representation Learning

Representation learning

Representation learning is the current de-facto research paradigm in Computer Vision, building on deep learning methods trained on large datasets to learn visual representations. In this ECCV workshop we explore what makes representation learning research “scientific”. Other scientific research fields revolve around empirical theories, e.g.: the theory of evolution (Darwin, 1859) in Biology, the theory of relativity (Einstein, 1916) in Physics, and dual process theory (Kahneman, 2011) in Psychology. In representation learning, however, the role of empirical theories is less clear. This workshops promotes empirical theory research in representation learning. While doing so, it explores questions such as: what is empirical theory? What empirical theories do we (implicitly) have? How can we encourage rigorous empirical research methods? How can we build empirical theories?

Machine learning theory vs empirical theory

Empirical theory differs from what is typically called theory in machine learning. In machine learning, computational learning theory (Shalev-Shwartz, 2014) is rigorously math driven, exploring mathematical performance bounds, time complexity, and feasibility of learning. It has important and deep mathematical results including theorems about no-free-lunch (Wolpert & Macready, 1997) and universal approximation (Cybenko, 1989), VC-theory (Vapnik, 1995), PAC-learning (Valiant, 1984), Rademacher complexity (Shalev-Shwartz, 2014), the neural tangent kernel (Jacot et al., 2021), etc.. In representation learning, however, models are trained by searching through a high dimensional non-convex solution space while navigating an exponential search space of hyper-parameters (Choi et al., 2019); (Feurer & Hutter, 2019), which precludes standard ML practices for hyper-parameter tuning like e.g. (nested) cross-validation (Krstajic et al., 2014) which is prohibitively computationally expensive due to the large datasets involved, and thus simply not done in practice (Bouthillier et al., 2021); (Bouthillier et al., 2019). Precise, rigorous, theoretical optimality is found as too restrictive and seen as a practically unnecessary artificial hurdle. Optimality is replaced with empirical ‘good enough’ approximate optimizations. This workshop does not focus on mathematical proofs, but in contrast, focuses on empirical experimental evidence.

Empirical evidence and benchmarking in representation learning

Empirical evidence is crucial in representation learning. Many papers experimentally demonstrate that a method can be engineered to boldly improve upon existing state of the art benchmark scores. That such a benchmark-breaking method even exists is valuable existential empirical evidence and propels the field (Hardt, 2025); (Krizhevsky, 2014); (Russakovsky et al., 2015). Often, however, it is not clear where the improvements originate (de Boer et al., 2023); (Bouthillier et al., 2021); (Lucic et al., 2018); (Musgrave et al., 2020); (Musgrave et al., 2021). Without such understanding, these empirical results lack empirical theoretical hypotheses: a clear, causal, link to the reasons that underlie the improvement. If, for example, an improvement is due to better tuned hyper-parameters, this does not increase our understanding, because we already know that better tuning hyper-parameters helps (Anand et al., 2020); (Bouthillier et al., 2019); (Brigato et al., 2021); (Picard, 2021). What, for example, would make it interesting here, is if the method for finding these hyper-parameters would generalize to other work, and how this hypothesis can be empirically justified. In this workshop, we aim to go beyond individual systems that work well, and instead aim for empirical theory: findings that generalize beyond idiosyncratic combinations of datasets, hyper-parameter settings and accidental optimization minima. We promote hypothesis-driven empirical research that gives insight, and breaking SOTA is neither sufficient nor necessary.

Empirical theory: rigor in experimental evidence

With empirical theory, we aim for a sweet spot between theoretical mathematical models on one side, and purely empirical benchmark-breaking systems on the other side. It’s about tracing the sources of empirical gain (Lipton & Steinhardt, 2019), and explicitly providing experimental, hypothesis-driven, empirical evidence that separates explanation from speculation (Lipton & Steinhardt, 2019). It is about understanding the training and the evaluating of deep learning models, their design and their components, optimizers, losses, and how this generalizes over problem and datasets types. We aim for understanding when and what makes a method applicable in other work. Such type of research, of course, already existed in the broader representation learning literature. Examples include the shift-invariance of CNNs (Chaman & Dokmanic, 2021); (Kayhan & Gemert, 2020); (Zhang, 2019), and their kernel size (Ding et al., 2022); (Grabinski et al., 2023); (Tomen & van Gemert, 2021), residual connections variants (Greff et al., 2017); (Veit et al., 2016); (Zhu et al., 2024), summing or multiplying activations (Ma et al., 2024), gating (Qiu et al., 2025), registers in transformers (Darcet et al., 2024) (Jiang et al., 2025); (Shi et al., 2026), object-centric learning (Rubinstein et al., 2025), subliminal learning (Schrodi et al., 2025) and even empirical reproducibility (Bouthillier et al., 2019); (Pineau et al., 2021); (Raff, 2019); (Yildiz et al., 2021), and many more. Beyond such insight-driven papers, there is work on explicit empirical theory building, including empirical neural scaling laws (Bahri et al., 2024), the lottery ticket hypothesis (Frankle & Carbin, 2018) (Pinson, 2026), the Platonic representation hypothesis (Huh et al., 2024), etc. This workshop shines a spotlight on such work, aiming to incentivize empirical insights, and empirical theory building for the entire field of representation learning.

Relation to other workshops and initiatives

Pre-registration in machine learning was explored at NeurIPS in 2011 in the Pre-registration workshop: An alternative publication model for machine learning research (Albanie et al., 2021). The ML-Retrospectives, Surveys & Meta-Analyses workshop (Yadav & et al., 2020) hosted at NeurIPS 2019, ICML 2020, NeurIPS 2020 and the recent Metascience for Machine Learning (Hung et al., 2025) initiative are related to meta-science for representation learning. The recent Mechanistic Interpretability Workshop (Nanda & et al., 2025) at ICML 2024 and NeurIPS 2025 and the Workshop on Scientific Methods for Understanding Deep Learning (Kadkhodaie & et al., 2026) at ICLR 2025/2026 are great examples of what we aim to achieve, albeit now in representation learning. We will make use of pre-registration to separate interesting questions from outcomes, and are inspired by meta-science.

We here aim to foster, and build a community for understanding-based research that is currently scattered over multiple venues, and mixed in with improvement-based research.

Bibliography

2026

Vision Transformers Need More Than Registers

Cheng Shi, Yizhou Yu, and Sibei Yang

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2026
It’s not a Lottery, it’s a Race: Understanding How Gradient Descent Adapts the Network’s Capacity to the Task

Hannah Pinson

2026
Workshop on Scientific Methods for Understanding Deep Learning

Kadkhodaie and al.

2026

Website

2025

The Emerging Science of Machine Learning Benchmarks

Moritz Hardt

2025

Website
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, and others

arXiv preprint arXiv:2505.06708, 2025
Vision Transformers Don’t Need Trained Registers

Nicholas Jiang, Amil Dravid, Alexei A Efros, and Yossi Gandelsman

In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
Are we done with object-centric learning?

Alexander Rubinstein, Ameya Prabhu, Matthias Bethge, and Seong Joon Oh

arXiv preprint arXiv:2504.07092, 2025
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer

Simon Schrodi, Elias Kempf, Fazl Barez, and Thomas Brox

arXiv preprint arXiv:2509.23886, 2025
Metascience for machine learning

Hayley Hung, Marco Loog, and Jan Gemert

2025

Website
Mechanistic Interpretability Workshop

Neel Nanda and al.

2025

Website

2024

Hyper-Connections

Defa Zhu, Hongzhi Huang, Zihao Huang, Yutao Zeng, Yunyao Mao, Banggu Wu, Qiyang Min, and Xun Zhou

In The Thirteenth International Conference on Learning Representations, 2024
Rewrite the Stars

Xu Ma, Xiyang Dai, Yue Bai, Yizhou Wang, and Yun Fu

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2024
Vision Transformers Need Registers

Timothée Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski

In International Conference on Learning Representations (ICLR), 2024
Explaining neural scaling laws

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma

Proceedings of the National Academy of Sciences, 2024
Position: The platonic representation hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang, and Phillip Isola

In Forty-first International Conference on Machine Learning, 2024

2023

Is there progress in activity progress prediction?

Frans Boer, Jan C Gemert, Jouke Dijkstra, and Silvia L Pintea

In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2023
Fix your downsampling asap! be natively more robust via aliasing and spectral artifact free pooling

Julia Grabinski, Janis Keuper, and Margret Keuper

arXiv preprint arXiv:2307.09804, 2023

2022

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

Xiaohan Ding, Xiangyu Zhang, Jungong Han, and Guiguang Ding

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022

2021

Neural tangent kernel: convergence and generalization in neural networks

Arthur Jacot, Franck Gabriel, and Clément Hongler

In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, 2021
Accounting for variance in machine learning benchmarks

Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Nazanin Mohammadi Sepahvand, Edward Raff, Kanika Madan, Vikram Voleti, and others

Proceedings of Machine Learning and Systems, 2021
Unsupervised domain adaptation: A reality check

Kevin Musgrave, Serge Belongie, and Ser-Nam Lim

arXiv preprint arXiv:2111.15672, 2021
Tune It or Don’t Use It: Benchmarking Data-Efficient Image Classification

Lorenzo Brigato, Björn Barz, Luca Iocchi, and Joachim Denzler

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct 2021
Torch.manual_seed(3407) is all you need: On the influence of random seeds in deep learning architectures for computer vision

David Picard

ArXiv, 2021
Truly Shift-Invariant Convolutional Neural Networks

Anadi Chaman and Ivan Dokmanic

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2021
Spectral leakage and rethinking the kernel size in cnns

Nergis Tomen and Jan C Gemert

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021
Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program)

Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché-Buc, Emily Fox, and Hugo Larochelle

Journal of machine learning research, 2021
ReproducedPapers. org: Openly teaching and structuring machine learning reproducibility

Burak Yildiz, Hayley Hung, Jesse H Krijthe, Cynthia CS Liem, Marco Loog, Gosia Migut, Frans A Oliehoek, Annibale Panichella, Przemysław Pawełczak, Stjepan Picek, and others

In International Workshop on Reproducible Research in Pattern Recognition, 2021
The pre-registration workshop: An alternative publication model for machine learning research

Samuel Albanie, Joao Henriques, Luca Bertinetto, Alex Hernandez-Garcia, Hazel Doughty, and Gul Varol

2021

Website

2020

A metric learning reality check

Kevin Musgrave, Serge Belongie, and Ser-Nam Lim

In European Conference on Computer Vision, 2020
Black magic in deep learning: How human skill impacts network training

Kanav Anand, Ziqi Wang, Marco Loog, and Jan Van Gemert

arXiv preprint arXiv:2008.05981, 2020
On translation invariance in cnns: Convolutional layers can exploit absolute spatial location

Osman Semih Kayhan and Jan C van Gemert

In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020
ML-Retrospectives, Surveys & Meta-Analyses

Chhavi Yadav and al.

2020

Website

2019

On empirical comparisons of optimizers for deep learning

Dami Choi, Christopher J Shallue, Zachary Nado, Jaehoon Lee, Chris J Maddison, and George E Dahl

arXiv preprint arXiv:1910.05446, 2019
Hyperparameter optimization

Matthias Feurer and Frank Hutter

In Automated machine learning: Methods, systems, challenges, 2019
Unreproducible Research is Reproducible

Xavier Bouthillier, César Laurent, and Pascal Vincent

In Proceedings of the 36th International Conference on Machine Learning, 2019
Unreproducible Research is Reproducible

Xavier Bouthillier, César Laurent, and Pascal Vincent

In Proceedings of the 36th International Conference on Machine Learning, 2019
Research for practice: troubling trends in machine-learning scholarship

Zachary C Lipton and Jacob Steinhardt

Communications of the ACM, 2019
Making convolutional networks shift-invariant again

Richard Zhang

In International conference on machine learning, 2019
A Step Toward Quantifying Independently Reproducible Machine Learning Research

Edward Raff

2019

2018

Are gans created equal? a large-scale study

Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet

Advances in neural information processing systems, 2018
The lottery ticket hypothesis: Finding sparse, trainable neural networks

Jonathan Frankle and Michael Carbin

arXiv preprint arXiv:1803.03635, 2018

2017

Highway and Residual Networks learn Unrolled Iterative Estimation

Klaus Greff, Rupesh K Srivastava, and Jürgen Schmidhuber

In International Conference on Learning Representations, 2017

2016

Residual networks behave like ensembles of relatively shallow networks

Andreas Veit, Michael J Wilber, and Serge Belongie

Advances in neural information processing systems, 2016

2015

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei

International Journal of Computer Vision (IJCV), 2015

2014

Understanding Machine Learning

Shai Shalev-Shwartz

2014
Cross-validation pitfalls when selecting and assessing regression and classification models

Damjan Krstajic, Ljubomir J Buturovic, David E Leahy, and Simon Thomas

Journal of cheminformatics, 2014
One weird trick for parallelizing convolutional neural networks

Alex Krizhevsky

arXiv preprint arXiv:1404.5997, 2014

2011

Thinking, Fast and Slow

Daniel Kahneman

2011

1997

No Free Lunch Theorems for Optimization

David H. Wolpert and William G. Macready

IEEE Transactions on Evolutionary Computation, 1997

1995

The nature of statistical learning theory

Vladimir N. Vapnik

1995

1989

Approximation by superpositions of a sigmoidal function

G. Cybenko

In Mathematics of Control, Signals and Systems,, 1989

1984

A theory of the learnable

L. G. Valiant

Commun. ACM, 1984

1916

The Foundation of the General Theory of Relativity

A. Einstein

1916

1859

On the Origin of the Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life

Charles Darwin

1859