New Technique Makes AI Models Leaner and Faster While They’re Still Learning in the Classroom

A graduate student named Makram Chahine has been working on a problem that has plagued the AI community for years, but no one really wants to acknowledge how serious it is, somewhere on the upper floors of MIT’s Stata Center. It is extremely costly to train a large model. Everyone is aware of that.

The field continues to do this because the alternatives have been worse, despite the fact that the hardware costs, electricity, and months of compute time on clusters that hum like industrial freezers are all unsustainable. Either you train something massive and then shave it down, or you train something tiny and acknowledge that it won’t be as intelligent. Really, there hasn’t been a third choice.

Field	Detail
Technique Name	CompreSSM
Lead Author	Makram Chahine, PhD candidate, EECS
Senior Author	Daniela Rus, Director of CSAIL
Institutions Involved	MIT CSAIL, Max Planck Institute for Intelligent Systems, ELLIS, ETH Zurich, Liquid AI
Architecture Targeted	State-space models (including Mamba)
Compression Stage	During training (after ~10% of training steps)
Mathematical Tool	Hankel singular values, drawn from control theory
Reported Speedup	Up to 4x on Mamba; 1.5x on image benchmarks
Benchmark Highlight	85.7% accuracy on CIFAR-10 at 1/4 state dimension
Comparison Baselines	Pruning, knowledge distillation, Hankel nuclear norm regularization
Year Announced	2026

Chahine and his colleagues recently published a technique called CompreSSM, which is an attempt at that third option. To put it simply, the idea is to let the model make its own decisions while it is still learning which parts of itself are essentially riding along for free and which parts are performing useful work. Then, surprisingly early in the training process, the parts that aren’t useful are cut. The remaining training proceeds at the pace of a considerably smaller model. Anyone who has worked in machine learning immediately assumes there must be a hidden cost somewhere because it sounds almost too neat to be true. The researchers contend that there isn’t using methods from control theory.

The trick is found in a mathematical quantity known as Hankel singular values, which quantifies the contribution of each internal state of a model to its overall behavior. These values, which are taken from a field that typically deals with chemical plants and airplanes rather than language models, end up stabilizing remarkably early—after about 10% of training. The team was taken aback by that aspect. The model basically keeps growing into itself, getting smaller and faster, rather than being trimmed back later by a different process, and once the rankings settle, the irrelevant dimensions can be eliminated with little impact.

It’s difficult to ignore the numbers. A model trained at that smaller size from the beginning only achieved 81.8 percent accuracy on CIFAR-10, whereas a compressed model operating at roughly a quarter of its original state dimension achieved 85.7 percent accuracy. The team reports about 4x training speedups on Mamba, one of the more popular state-space architectures at the moment, reducing a 128-dimensional model to about 12 dimensions while maintaining competitive performance. Because the previous method required costly eigenvalue computations at each gradient step, CompreSSM performed more than 40 times faster than a recent regularization method that accomplishes something similar in spirit.

As this develops, it seems as though the AI sector has been silently anticipating a similar outcome. The current method of shrinking models, which involves training a large teacher and then a smaller student on top of it, has always seemed like a clumsy workaround and doubles the training expense. Knowledge distillation functions similarly to duct tape. CompreSSM is not like that. It incorporates compression into the process of learning, which is more in line with the actual development of biological systems. The senior author, Daniela Rus, presented it as a radically different approach to developing AI. This isn’t just press release rhetoric; the math supporting it, which relies on Weyl’s theorem to demonstrate the smoothness of state importance over time, lends credence to the assertion.

It remains to be seen if industry quickly adopts it. Compared to transformers, state-space models occupy a smaller portion of the market, and the majority of funding still goes to the larger architectures. However, there is clearly a desire for anything that lowers computation. It’s difficult to ignore the fact that large labs elsewhere were ordering more GPUs during the same week that researchers were celebrating leaner training. I’m getting used to the contrast.

For the time being, CompreSSM is in that early stage where promising research frequently resides: it has been published, tested, and is intriguing, but it hasn’t yet been integrated into the production pipelines that would make it significant on a large scale. It’s still unclear if the method will generalize to other architectures as smoothly or if, once it leaves the benchmark suite, some hidden brittleness will become apparent. While the field is generally skeptical, usually with good reason, researchers tend to be optimistic about their own work. In any case, the fundamental finding—that a model can determine what to discard before it has even completed learning—feels like the kind of minor concept that goes farther than anyone anticipated.

New Technique Makes AI Models Leaner and Faster While They’re Still Learning in the Classroom

Northumbria University Is Quietly Becoming One of Britain’s Most Ambitious Institutions — Here’s Why

Spencer Pratt’s Education at USC: How a Reality TV Villain Spent 10 Years Getting a Political Science Degree

Prince George Secondary Education Decision: Eton, Marlborough, or Oundle — What the Royal Family Is Really Weighing

Northumbria University Is Quietly Becoming One of Britain’s Most Ambitious Institutions — Here’s Why

Spencer Pratt’s Education at USC: How a Reality TV Villain Spent 10 Years Getting a Political Science Degree

Federal Education Loan Borrowing Caps Are Changing July 1 — Here’s What Every Student Needs to Know Now

Prince George Secondary Education Decision: Eton, Marlborough, or Oundle — What the Royal Family Is Really Weighing

VA Specialty Education Loan Repayment Program: The $160,000 Benefit Most Medical Residents Don’t Know Exists

How Long Is the AP Precalculus Exam — And Is Three Hours Really Enough to Show What You Know?

Princeton Faculty Voting on Exam Proctoring Ends 133 Years of Student Honor — And AI Is Why

Most Popular

ShinyHunters Canvas School List: How a Single Vendor Breach Cascaded Into 9,000 Institutions at Risk

OMEP 2026: The Conference That Asks the World to Finally Listen to Its Children

From Village Elders to Global Policy: The Untold History of Early Childhood Care and Education

The $20 Billion EdTech Bubble: Why Wall Street is Betting Big on Teacherless Classrooms

Early Years Education is Key to Future Success in Life—So Why Is America Underfunding It?

‘Helping You Thrive in Secondary School’: The Digital Campaign Fighting the Tween Anxiety Epidemic

New Technique Makes AI Models Leaner and Faster While They’re Still Learning in the Classroom

Related Posts