Language Models Tackling Weather Forecasting Challenges
Le brief IA que les pros lisent chaque soir
Les 7 actus IA du jour, décryptées en 5 min. Gratuit.
Inclus dès l'inscription : notre sélection des meilleurs guides & comparatifs IA.
Choisis ton rythme
Gratuit · Pas de spam · Désabonnement en 1 clic
Language Models and Their Limitations in Meteorology
Language models, often referred to by the acronym LLM, have become common tools for providing weather forecasts. However, they are not designed to indicate precisely when a significant weather change occurs. This gap, while it may seem trivial, has important implications in practice.
Modern numerical weather prediction systems, such as the ECMWF IFS, provide remarkably accurate forecasts. With a resolution of about 9 km and frequent updates, this data is of exceptional quality. However, the real challenge lies in interpreting these changes to determine their actual significance. The issue is not with the forecast itself, but with the attention given to when a change in this data is truly significant.
The Impact of Chaos Theory
My interest in this problem does not stem from my experience in software engineering, but from a previous study of chaos theory at the Instituto Balseiro. There, I discovered a fascinating idea: a system can be entirely deterministic while remaining practically unpredictable. This notion has profoundly influenced my understanding of AI systems.
In developing artificial intelligence systems, I found that many of them did not account for this inherent complexity of chaotic systems. Chaos theory taught me that an apparently orderly system can become unpredictable, which is an essential lesson for anyone working with forecasting models.
The Errors of Intuitive Approaches
Observing how developers create weather agents, I noticed a tendency to simplify the process: retrieve forecast data, integrate it into an LLM, and then ask if the weather has changed significantly. While this seems logical, it is problematic for systems where decision thresholds are well-defined.
In a chaotic system, the significance of a change is not a matter of language, but of precise thresholds on variables such as temperature or precipitation. LLMs, designed to generate language, are not suited to impose deterministic boundaries on these systems. An LLM is a stochastic process, excellent for generating language, but not for imposing deterministic limits on physical systems.
The Failures of LLMs
LLMs exhibit several subtle modes of failure:
- They may infer trends from wording rather than actual data.
- They sometimes make inconsistent decisions for similar inputs.
- Their outputs are often difficult to test or reproduce.
In sectors like agriculture or energy, where a temperature variation of 3°C can have major consequences, these failures are unacceptable. Decisions must be stable and explainable. For example, a drop in temperature can represent a phase transition for a crop or a peak in energy demand.
A Simple Rule for Using LLMs
I have established a simple rule: if a statement can be formulated, it is better not to use an LLM prompt. This rule stems from the experience I gained working with systems where precision and traceability of decisions are crucial.
My Professional Journey
My professional background is diverse, ranging from a Marie Curie PhD in climate dynamics to leading R&D at the national meteorological institute of Uruguay. I have worked on wildfire prevention, seasonal forecasting, and climate adaptation, before turning to machine learning at Microsoft and Mercado Libre.
This experience has allowed me to understand the physics of data and what a significant change in a physical system truly means, beyond software abstractions. I have learned to view data not as abstractions, but as measurable deltas on variables with known uncertainty limits.
The Architecture of Skygent
Skygent is structured in five distinct layers, with a single layer dedicated to calling the LLM.
The Deterministic Guardian
At the heart of Skygent is a Python evaluator that does not interpret but performs precise calculations. It compares validated forecasts, evaluates deltas against configurable thresholds, and integrates context and the forecast horizon.
Decisions are made on this basis, and each alert is traceable, indicating which variable has changed and which threshold has been crossed. In a professional setting, this traceability is essential. An alert is triggered only if a threshold is crossed, allowing for a binary, testable condition, rather than a subjective judgment.
The Limited Role of the LLM
The LLM only comes into play after the decision-making process, to translate structured data into natural language. For example, an increase in the probability of rain from 10% to 50% is explained in clear terms, but the LLM makes no decisions. It merely transforms structured JSON data into an understandable narrative.
Why This Architecture is Testable
It is difficult to test a pure LLM agent at 100%, as the outputs are probabilistic. Skygent's hybrid approach, with decision logic in Python, allows for comprehensive unit testing without dependence on the LLM. The decision logic is purely in Python, with 204 unit tests and no LLM dependency in the test suite.
Event-Based LLM Invocation
Unlike a naive agent that calls the LLM at every cycle, Skygent evaluates data every six hours and only calls the model if a threshold is crossed. This significantly reduces the number of calls, making the system more efficient and cost-effective. At a rate of GPT-4o-mini, the cost is negligible and proportional to the actual information.
A Concrete Example
Let's take an example: if the probability of rain increases from 10% to 50%, an alert is triggered, and the LLM generates a narrative explanation. This process is repeated every six hours. For instance, an alert might be triggered if the delta in rain probability exceeds 20 percentage points.
Limitations of This Approach
This method is not universal. It fails when inputs are ambiguous or when decision boundaries cannot be defined by clear thresholds. In such cases, LLM-based architectures, like ReAct, are more appropriate.
Conclusion
The most crucial decision in building this system was determining where not to use an LLM. While LLMs are increasingly used to solve various problems, some decisions require a clear and defined structure. In these cases, it is better to explicitly encode decisions rather than approach them through language.
The complete implementation of this system is available at: github.com/ferariz/skygent
Brief IA — L'actualité IA en français
L'essentiel de l'actualité de l'intelligence artificielle, décrypté et expliqué chaque jour.