Aleksei Naumov, Lead AI Engineer at Terra Quantum, is leading advancements in AI model optimisation, tackling energy consumption challenges and envisioning the future of large language models.
Aleksei Naumov, a Lead AI Engineer at Terra Quantum, has become a prominent figure in the field of artificial intelligence, particularly in the area of AI model optimisation. His academic journey began at Lomonosov Moscow State University, where he earned a degree in Physics with a specialization in robotics and applied mathematics. This foundation paved the way for Naumov’s entrance into the world of deep learning, which he first explored during his bachelor’s thesis through a project involving the development of an algorithm for automatic quadcopter landing using computer vision.
Transitioning from academia to the industry, Naumov joined Terra Quantum, a Swiss deeptech leader supported by over $100 million in investment. Within the company, he quickly advanced to lead an AI research team that has published significant work in AI model optimisation, focusing on techniques such as tensor decomposition and tensor network methods. Naumov and his team recently presented their findings on language model compression at the IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR) in San Jose, California.
In a discussion about the energy demands of large language models (LLMs), Naumov highlighted concerns about rising energy consumption levels. He observed that LLMs could potentially reach energy consumption levels akin to 160 companies the size of Meta, should their use become more ingrained in daily routines. As a solution, Naumov pointed to model optimisation techniques, such as knowledge distillation, which could facilitate a shift towards smaller, specialised models that are more energy-efficient.
Among his career achievements, Naumov takes pride in leading two notable projects: TQCompressor and TetraAML. The TQCompressor project involves a method for compressing LLMs, effectively reducing the size of models like GPT-2 by approximately 35%, while using only a fraction of the original dataset for training. This initiative has shown significant savings in time, resources, and costs, and the model, TQCompressedGPT-2, has been made publicly available to encourage further research.
The TetraAML project presents a framework to optimise computer vision models, enabling their deployment on devices with limited resources. A notable success of this framework was compressing the ResNet-18 model by 14.5 times with minimal loss in quality.
Looking to the future, Naumov is keen on the evolution of LLMs, particularly their deployment on devices. He envisions a future where LLMs are habitual features of user devices, providing streamlined, secure, and integrated experiences much like today’s autocorrect technologies. He also anticipates advancements in image and video generation, driven by models like FLUX and APIs from companies such as Runway and Kling, which might transform industries spanning cinema to consumer apps.
Naumov predicts significant changes in AI hardware over the next decade, with a rise in specialised hardware for various AI applications. This new wave could include training clusters, new architectures for cloud AI inferences, and specialised chips for on-device AI and generative networks aimed at image and video applications.
Aleksei Naumov’s journey from academic exploration to industry innovation highlights his influential role in steering AI towards more sustainable and resource-efficient futures, underscoring the transformative potential of optimising AI models in serving diverse technological endeavours.
Source: Noah Wire Services


