• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Scientists Present New Solution to Imbalanced Learning Problem

Scientists Present New Solution to Imbalanced Learning Problem

© iStock

Specialists at the HSE Faculty of Computer Science and Sber AI Lab have developed a geometric oversampling technique known as Simplicial SMOTE. Tests on various datasets have shown that it significantly improves classification performance. This technique is particularly valuable in scenarios where rare cases are crucial, such as fraud detection or the diagnosis of rare diseases. The study's results are available on ArXiv.org, an open-access archive, and will be presented at the International Conference on Knowledge Discovery and Data Mining (KDD) in summer 2025 in Toronto, Canada.

The problem of imbalanced learning is becoming increasingly relevant across various fields, including banking and medicine. Conventional methods, such as random oversampling, often generate low-quality samples or fail to accurately model rare class data.

Simplicial SMOTE (Synthetic Minority Oversampling Technique), a novel solution proposed by scientists from HSE University and Sber AI Lab, addresses these issues by enabling more accurate modelling of complex topological data structures and improving classifier performance on imbalanced datasets.

It generates new examples of a rare class by leveraging information from multiple closed instances ('simplex'), rather than just two close points, as in the original SMOTE and its well-known modifications. This facilitates a better understanding of the data and advances performance. The technique improves training on imbalanced data, where one class (eg, normal transactions) has many examples, while another class (eg, fraud) has few.

Researchers have experimentally shown on a large number of test datasets that the proposed approach achieves significantly better performance metrics, such as the F1 Score and Matthews Correlation Coefficient, for both the basic SMOTE and its modifications. In particular, an improvement was observed in gradient boosting, a classifier commonly used in practice.

'Our technique is particularly effective for tasks involving imbalanced data, where the rare class holds greater significance. Banks can use Simplicial SMOTE to detect fraud more effectively, and medical centres can apply it to diagnose rare diseases,' says Andrey Savchenko, co-author of the article and Leading Research Fellow at the Laboratories for Theoretical Modelling in AI of the HSE AI and Digital Science Institute.

The new technique can be integrated into existing oversampling algorithms (such as Borderline-SMOTE, Safe-level-SMOTE, and ADASYN), enabling better accuracy without significantly increasing computational complexity. According to the researchers, the developed approach could contribute to the creation of more accurate and reliable machine learning models, thereby improving the quality of analytics.

The study was conducted with support from the HSE Basic Research Programme.

See also:

Researchers Find More Effective Approach to Revealing Majorana Zero Modes in Superconductors

An international team of researchers, including physicists from HSE MIEM, has demonstrated that nonmagnetic impurities can help more accurately reveal Majorana zero modes—quantum states considered promising building blocks for quantum computing. The researchers found that these impurities shift the energy levels that typically obscure the Majorana signal, while leaving the mode itself largely unaffected, thereby making its spectral peak more distinct. The study has been published in Research.

New Development by HSE Scientists Helps Design Reliable Electronics Faster at a Lower Cost

Scientists from HSE MIEM have developed a new approach to modelling electrothermal processes in high-power electronic circuits on printed circuit boards (PCB). The method allows engineers to quickly and accurately predict how electronic components heat up during operation, helping prevent overheating and potential failures. The results have been published in Russian Microelectronics.

The Future of Cardiogenetics Lies in Artificial Intelligence

Researchers from the AI and Digital Science Institute at the HSE Faculty of Computer Science have developed a program capable of analysing regions of the human genome that were previously inaccessible for accurate interpretation in genetic testing. The program adapts large generative AI (GenAI) models for cardiogenetics to predict how specific mutations affect the function of individual genes.

HSE Researchers: Young Russians Have Sufficient Knowledge About Money but Lack Money Management Skills

Adolescents and young adults in Russia today are well versed in financial terminology: they know what bank cards, loans, interest rates, and online payments are. However, as researchers at HSE University have found, real money-management skills remain poorly developed among most young people. The study ‘Financial Literacy, Financial Culture, and Financial Autonomy of Youth’ has been published in Monitoring of Public Opinion: Economic and Social Changes.

Why Weaker Competitors Give Up—and How to Keep Them in the Game

Anastasia Antsygina, Assistant Professor at HSE University’s Faculty of Economic Sciences, has developed a prize distribution model that maximises competitor engagement. She proposed revising the traditional ‘winner-takes-all’ approach and, in certain cases, offering a small reward even to those who have lost. According to her, this could increase participant motivation and make the competition more intense. The findings of her research were published in the Economic Theory journal.

HSE Researchers Compile Scientific Database for Studying Children’s Eating Habits

The database created at HSE University can serve as a foundation for studying children’s eating habits. This is outlined in the study ‘The Influence of Age, Gender, and Social-Role Factors on Children’s Compliance with Age-Based Nutritional Norms: An Experimental Study Using the Dish-I-Wish Web Application.’ The work has been carried out as part of the HSE Basic Research Programme and was presented at the XXVI April International Academic Conference named after Evgeny Yasin.

New Foresight Centre Study Identifies the Most Destructive Global Trends for Humankind

A team of researchers from the HSE International Research and Educational Foresight Centre has examined how global trends affect the quality of human life—from life expectancy to professional fulfilment. The findings of the study titled ‘Human Capital Transformation under the Influence of Global Trends’ were published in Foresight.

Scientists Develop Algorithm for Accurate Financial Time Series Forecasting

Researchers at the HSE Faculty of Computer Science benchmarked more than 200,000 model configurations for predicting financial asset prices and realised volatility, showing that performance can be improved by filtering out noise at specific frequencies in advance. This technique increased accuracy in 65% of cases. The authors also developed their own algorithm, which achieves accuracy comparable to that of the best models while requiring less computational power. The study has been published in Applied Soft Computing.

HSE and Yandex Propose Method to Speed Up Neural Networks for Image Generation

A team of scientists at HSE FCS and Yandex Research has proposed a method that reduces computational costs and accelerates text-to-image generation in diffusion models without compromising quality. These models currently set the standard for text-to-image generation, but their use is limited by high computational loads, the company said in a statement.

HSE Scientists Identify Effective Models for Training Research Personnel for Industry

Experts from the HSE Institute for Statistical Studies and Economics of Knowledge have examined industrial PhD programmes across 19 countries worldwide. The analysis shows that the key components of an effective model include co-funding by universities, industry, and government; dual academic supervision; and flexible intellectual property arrangements. The findings have been published in Foresight and STI Governance.