Jul

2025

Large Language Models No Longer Require Powerful Servers

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.

This method enables faster testing and more efficient implementation of new neural network-based solutions, reducing both development time and costs. As a result, LLMs are more accessible not only to large corporations, but also to smaller companies, non-profit laboratories and institutes, as well as individual developers and researchers.

Previously, running a language model on a smartphone or laptop required quantising on an expensive server—a process that could take anywhere from a few hours to several weeks. Quantisation can now be performed directly on a smartphone or laptop in just a few minutes.

Challenges in implementing LLMs

The main obstacle to using LLMs is that they require considerable computational power. This applies to open-source models as well. For example, the popular DeepSeek-R1 is too large to run even on high-end servers built for AI and machine learning workloads, meaning that very few companies can effectively use LLMs, even if the model itself is publicly available.

The new method reduces the model's size while maintaining its quality, making it possible to run on more accessible devices. This method allows even larger models, such as DeepSeek-R1 with 671 billion parameters and Llama 4 Maverick with 400 billion parameters, to be compressed, which until now could only be quantised using basic methods and resulted in significant quality loss.

The new quantisation method opens up more opportunities to use LLMs across various fields, particularly in resource-limited sectors such as education and the social sphere. Startups and independent developers can now implement compressed models to create innovative products and services without the need for costly hardware investments. Yandex is already applying the new method for prototyping—creating working versions of products and quickly validating ideas. Testing compressed models takes less time than testing the original versions.

Key details of the new method

The new quantisation method is named HIGGS (Hadamard Incoherence with Gaussian MSE-Optimal GridS). It enables the compression of neural networks without the need for additional data or computationally intensive parameter optimisation. This is especially useful in situations where there is not enough relevant data available to train the model. HIGGS strikes a balance between the quality, size, and complexity of the quantised models, making them suitable for use on a variety of devices.

The method has already been validated on the widely used Llama 3 and Qwen2.5 models. Experiments have shown that HIGGS outperforms all existing data-free quantisation methods, including NF4 (4-bit NormalFloat) and HQQ (Half-Quadratic Quantisation), in terms of both quality and model size.

Scientists from HSE University, the Massachusetts Institute of Technology (MIT), the Austrian Institute of Science and Technology (ISTA), and King Abdullah University of Science and Technology (KAUST, Saudi Arabia), all contributed to the development of the method.

The HIGGS method is already accessible to developers and researchers on Hugging Face and GitHub, with a research paper available on arXiv.

Response from the academic community, and other methods

The paper describing the new method has been accepted for presentation at one of the largest AI conferences in the world—the North American Chapter of the Association for Computational Linguistics (NAACL). The conference will be held from April 29 to May 4, 2025, in Albuquerque, New Mexico, USA, and Yandex will be among the attendees, along with other companies and universities such as Google, Microsoft Research, and Harvard University. The paper has been cited by Red Hat AI, an American software company, as well as Peking University, Hong Kong University of Science and Technology, Fudan University, and others.

Previously, scientists from Yandex presented 12 studies focused on LLM quantisation. The company aims to make the application of LLMs more efficient, less energy-consuming, and accessible to all developers and researchers. For example, the Yandex Research team has previously developed methods for compressing LLMs, which reduce computational costs by nearly eight times, while not significantly compromising the quality of the neural network’s responses. The team has also developed a solution that allows running a model with 8 billion parameters on a regular computer or smartphone through a browser interface, even without major computational power.

Date

1 July 2025

Topics

HSE Development Programme until 2030

Keywords

research projects international cooperation frontiers of science large language models

Physicists at HSE University and FIAN Discover Way to 'Photograph' Sound for Testing Materials Used in 6G Communications

Researchers at HSE University, in collaboration with colleagues from the Lebedev Physical Institute of the Russian Academy of Sciences (FIAN), have developed a method for rapidly determining how firmly a film is bonded to a substrate. This is important for the creation of ultrahigh-frequency acoustic filters, which are key components of next-generation 5G and 6G communications. For the first time, researchers have succeeded in measuring the lateral rigidity of the bond between a two-dimensional material film and a substrate in this way. The study results have been published in Applied Physics Letters.

23 July

Jul

2026

Scientists Create Open Dataset for Studying Concentration

A team of Russian researchers, including scientists from HSE University–St Petersburg, has developed the first open multimodal dataset containing recordings of brain activity, heart function, and video observations to help researchers understand what happens in the human brain during deep concentration. In the future, the dataset could accelerate the development of neural interfaces, rehabilitation technologies, and AI systems. The article has been published in Scientific Data.

20 July

Jul

2026

Scientists Propose Method for More Efficient Resource Use in Machine Learning

An international group of researchers, including mathematicians from the AI and Digital Science Institute at the HSE Faculty of Computer Science, has provided a theoretical justification for a simple and computationally efficient method of estimating uncertainty in Stochastic Gradient Descent (SGD). The paper has been published on the scientific preprint server arXiv.org and presented at AISTATS 2026.

20 July

Jul

2026

Team Success: Aligning Means with Objectives

In corporations, sports, and academia, people often face challenges they cannot handle alone. In such cases, selecting the right team is crucial. Tatiana Mayskaya, Associate Professor at the HSE Faculty of Economic Sciences and the International College of Economics and Finance, together with colleagues from foreign universities, examined team characteristics and found that less diverse teams are better suited to objectives where a high average performance is important, whereas more diverse teams are preferable when avoiding failure is critical. The paper has been published in Economic Theory.

16 July

Jul

2026

Economists Propose More Effective Approach to Reducing Smoking

Economists at HSE University have examined how smokers respond to changes in cigarette prices. When tobacco prices increase, cigarette consumption does not always decline. In fact, spending on tobacco may even rise: according to the researchers, a 1% decrease in cigarette affordability leads to a 0.28% increase in per capita tobacco expenditure. The findings suggest that to reduce smoking rates, tobacco prices must rise faster than household incomes. The study has been published in Voprosy Statistiki.

15 July

Jul

2026

Biologists Discover Unique Properties of MiR-93-5p MicroRNA in Prostate Cancer

Researchers at the International Laboratory of Microphysiological Systems of the HSE Faculty of Biology and Biotechnology investigated how different isoforms of the same microRNA influence gene function in prostate adenocarcinoma. The study found that in some cases, microRNAs can reinforce each other’s effects by targeting and suppressing the same genes. This finding offers a fresh perspective on the molecular mechanisms underlying tumour development and on the search for disease biomarkers. The results have been published in PeerJ.

13 July

Jul

2026

HSE Researchers Provide the World’s First Legal Definition of a Digital Ecosystem

Digital ecosystems have evolved from a technological innovation into a fundamental institution of the modern economy over the past few years. According to HSE University’s latest estimates, they account for 8.5% of Russia’s GDP. Previously, no jurisdiction had a statutory definition of what constitutes a digital ecosystem. HSE University researchers have addressed this gap by proposing the first legal concept of a digital ecosystem. Their article, ‘The Digital Ecosystem as a Novel Economic Phenomenon and Legal Concept,’ has been published in the BRICS Law Journal.

13 July

Jul

2026

HSE Economists Use Search Queries to Forecast Birth Rates

Researchers from the HSE Faculty of Economic Sciences have shown that the accuracy of birth rate forecasts for Russia can be improved by almost 50% by incorporating the dynamics of online search queries related to pregnancy and childbirth into forecasting models. In the best-performing models, the forecasting error fell from 4.6% to 3.2%. The findings have been published in Populations and Economics.

9 July

Jul

2026

When Looking at Their Own Faces, Men Forget Everything

In an experiment involving 15 healthy men, scientists at HSE University investigated how different phases of the cardiac cycle influence the excitability of the motor cortex when participants viewed either their own photograph or the faces of strangers. The researchers found that when participants looked at their own image, the brain’s response to signals from the heart was weaker, meaning that the influence of cardiac activity on the motor cortex decreased. This finding came contrary to expectations, as self-focused attention was thought to enhance the brain's sensitivity to internal bodily signals. The study has been published in Frontiers in Signal Processing.

9 July

Jul

2026

HSE Researchers Discover Who Eats Out in Russia—And Why

Around one-third of Russians (31.3%) rarely eat out or buy ready-made meals. The core group of active consumers—those who eat out or purchase prepared food almost every day or several times a week—accounts for only about 9% of the population. These are the findings of a study conducted by the HSE Institute for Social Policy. According to the researchers eating out is no longer a marker of high social status in Russia.

8 July