Reliability and sustainability: my takeaways from dotAI 2025

This year’ dotAI was a great opportunity to review the progress in the AI engineering and also a moment to take a step back to look at what we have achieved and where we are really going. The organization of the conference was impeccable, the venue was superb, and the talks were all well prepared and staged. Now, rather than going through the talks one-by-one, I would like to review the main topics discussed on- and off the stage.

Reliability #reliability

As reminded by Natalia Segal (nvidia) we live in a physical world that is orderly but too complex to reliably predict the effects of every action. Throughout the ages, humans learned to harness this unreliability, first through the use of simple tools, then by developing science and engineering.

Manufacturing engineers design the value chain to make the best use of the resources and to deliver the product that the client expects. This involves the constants supervision of the production line and the quality assurance that will sort through the finished items and separate the wheat from the chaff. In electrical engineering, the electric circuits are designed to operate at working points far from unpredictability of the quantum mechanics and are charged with redundancy for more reliability.

For the last sixty years¹ of software engineering, we have learnt to write code that safeguards against inevitable bugs (error handling), or from badly formatted inputs (input validation). Now with the advent of the AI we need to deal with another form of unreliability: like physical systems, AI tools are nondeterministic generating different outputs for the same inputs.

Unsurprisingly, many of the speakers at dotAI, talked about turning unreliable AI models into trustworthy companions. Katia Gil Guzman (OpenAI) demonstrated how coding agents are like virtual teammates that can implement new features over lunchtime saving you many hours of work. However, similar to their human counterparts these agents can make errors, so the AI generated code can not be shipped without “looking under the hood.”

In the same spirit, Viktoria Semaan from databricks advocated for benchmarks that are integrated in the AI agent platform. The same message was reiterated by many other speakers who stressed that AI is not ready to be fully autonomous and still needs the human in the loop.

Paradigms #paradigms

Another approach to achieve reliability is to develop dedicated paradigms that embrace the powers of AI while protecting the user from its faults. One such approach was put forward by Vaibhav Gupta, the mastermind behind the upstarting BAML language that leverages interactivity and strong type system for AI engineering. While BAML’s interactivity supports processes with the human in the loop, its type system enforces hard guarantees on the expected outputs of LLM models. If the model fails to conform to the type specification, the inference can be automatically retried until success in a sort of try/repeat loop.

An alternative approach, proposed by Rémi Louf from .txt, is to enforce strict type contracts at the token level, ensuring that the model’s outputs always conform to the expected format. This paradigm frees developers from what Rémi calls “the hell of error-handling code that occasionally does AI.” But he goes further: Rémi envisions an “Unix revolution” of AI built from small, composable tools with clean interfaces that can be combined into more complex systems.

Yet another voice in the discussion belonged to Claire Gouze’s (nao labs) who called for clean data models and strict document rules. Only with these two elements in place, “AI can become a powerful tool to draw business insights from in-house data.”

Normal technology #normaltech

Claire’s point may be the crux of the entire discussion: AI is a powerful tool, but its benefits are by no means automatic. We need to put effort, time, money and creativity to realize its potential. It’s not much different² from any other technology that we have acquired over decades. Taking examples close to my heart, the version control systems, the jupyter notebook and the language servers were not that common when I started professionally as a research scientist and than a data scientist.

Today all of them are boring technologies as Fabien Potencier, the creator of the popular Symphony framework and more recently the founder of upsun (formely platform.sh) calls them. These technologies may not be fancy or hyped now, but they are extremely popular and good at what they were designed to do. Can we reach the same level of “boredom” with AI? Fabien believes we can; all it takes are small, task-specific models, optimized contexts, and the right tools and processes to ensure accuracy.

The path to this goal might not be straightforward tough. Elisa Gilles and Giulia Bianchi from LeBonCoin leave us no doubt: in spite of the hype, only 36% of leboncoin use cases are in GenAI, the rest is the old boring but efficient machine learning. This number echoed the statistics cited by Gaël Varoquaux from :probabl and Inria: despite the maturity of the data stack and well established practices, “80% of ML projects never go into production.“

This subject boomeranged at the end of the conference: In a very engaging way, Alex Palcuie, one of the first two employees at Anthropic, deconstructed for us what is hype and what is real. He called “not to fire the SRE engineers just yet” and instead he offered a framework through which we can work towards the autonomy of the AI systems: from level 1 (fully human controlled) to level 4 (AI is autonomous at detection, response and prevention stages). Current AI-based systems are at level 2 (autonomous in detection only).

Sustainability #sustainability

Gaël came up with yet another precious insight: the machine learning stack resides on the publicly founded research partially conducted by his Inria team and volunteers. It’s a public digital infrastructure and the continued financial and organizational support is necessary to have steady progress and to build the data sovereignty on European soil. Similar arguments are advanced by Alex Laterre, head of AI research at InstaDeep, who demonstrated how an in-house built supercomputer and orchestration platform can accelerate drug discovery. And it’s not only about software and hardware, as Elisa reminded us, it’s also essential from both privacy and regulation perspectives that the data used to train and use models has European residency.

Efficiency #efficiency

The sustainability and reliability ensure the uninterrupted flow of trustworthy data from AI to the public. However, at the other end of the engineering pipeline are the resources, which are never unlimited. To decrease cost and maximize value, we need to use efficiently the available resources. This applies to the factories, our health or transport systems, but even more so to the compute technology.

LLM models (especially frontier models) are resource heavy and require careful management of energy, cost, and hardware. Two speakers covered these aspects from two different angles: Tejas Chopra (Netflix) and Bertrand Charpentier (Pruna AI).

Tejas argued that while the LLMs become more and more memory constrained, the memory size of modern AI chips does not scale as fast as the compute. Therefore, if we don’t take appropriate precautions in the training infrastructure and model architectures we risk to under-utilize the compute resources.

Bertrand’s focus was more on the compute side. He showed that by careful implementation choices (quantization, caching, compilation, flash attention, factorization) we can drastically decrease the energy consumption at inference time (which is 80% of all AI energy usage), reduce the CO2 emissions and … let him rest after having finished only a handful of half-marathons.

However, all of this may not be enough if we encounter the rebound effect i.e. increased usage driven by reduced costs. Therefore, complementary approaches, such as employing smaller, task-specific models as highlighted by other speakers, remain essential.

Real world use cases #usecase

The dotAI stage provided a valuable opportunity to showcase real-world use cases across a wide range of domains. One particularly poignant example came from Nnenna Ndukwe of Qodo AI, who captured the shared anxiety many of us feel when delivering software and argued that AI solutions when “started small and scaled smart” can help ease that stress.

Unfortunately, most of these examples were created by IT specialists for IT specialists, limiting their broader appeal. To gain public acceptance, AI developers must demonstrate that these technologies are not here to replace human workers, but rather to free people from everyday repetitive, mundane tasks and enhance overall quality of life. While this is no small challenge, AI is undoubtedly here to stay; it’s up to us to ensure it serves the common good.

Bartosz Teleńczuk

freelance ML engineer