The power of AI in climate modelling and beyond

Written by Laure Poncet and Dr. Sanaa Hobeichi (ARC Centre of Excellence for 21^st Century Weather, ARC Centre of Excellence for Climate Extremes)

Climate modelling is vital to understand how our climate will change in the future, guiding decision-making and the development of strategies to mitigate impacts. However, our ability to model climate is often hampered by the enormous amount of computational power required to run climate models.

This is where machine learning, a subfield of AI which involves learning from patterns in the data, can help. Machine learning models can be used to mimic the behaviour of climate models without having to crank through all the mathematical calculations, thus speeding up simulations and reducing the need for computational power. Another way machine learning can speed up climate model simulations is by replacing bits of models that are computationally expensive – typically the modelling of small-scale complex processes such as cloud formation and vegetation changes.

But the potential of machine learning extends beyond this. Machine learning models can also be used to provide early warnings of extreme weather events such as fires, droughts, and floods. By identifying patterns in historical and current data, these models have the potential to generate timely predictions of future extreme events. This could improve early warning systems and offer better protection for affected communities.

Here, we provide a guide to machine learning applications in the climate space. We begin by introducing the concepts of machine learning and climate modelling before providing an overview of various applications of machine learning in this field and the associated challenges.

What is machine learning?

Machine learning involves building models that can analyse and identify patterns in large volumes of data, and then apply what they have learnt to make predictions.

This process usually involves three main steps:

Data collection: the data used to train the model is collected and pre-processed to ensure it is in a suitable format.
Model training: the machine learning model analyses and finds meaningful patterns in the data.
Model deployment: once the model has learned these patterns, it is ready to start its own prediction based on what it has learned.

A short introduction to climate modelling

The most common way to explore how our climate will look in the future is by using global or regional climate models (Figure 1).

Figure 1: Representation of a GCM and an RCM. Black parallels lines represent the grid of a GCM. Source: F. Giorgi, WMO
World Meteorological Organisation Bulletin 52(2), April 2008.

Global climate models (GCMs) use mathematical equations to represent the processes and interactions of the Earth’s system. These equations include a range of climate variables such as temperature, pressure, humidity and wind. By solving these equations, GCMs can predict how climate variables will change in the future based on different scenarios of greenhouse gas emissions. To achieve this, GCMs divide the Earth into a grid – with each grid cell representing a specific area – and solve equations for each grid cell (Figure 1). The size of the grid determines the resolution of the model: the smaller the grid cells, the higher the resolution. Conversely, the larger the grid cells, the coarser the resolution. The size of the grid also influences the need for computational power. Smaller grids result in more cells overall and more equations to solve. Therefore, models with fine resolution usually require more computational power than models with coarser resolution.

With a resolution ranging from 50 to 250 km, GCMs are useful tools to provide a broad overview of how our climate will change in the future.

Regional climate models (RCMs), typically use a much finer resolution – ranging from 1 to 50 km – and provide information at a regional or local scale. An RCM incorporates detailed information about local features such as topography and coastlines. The finer resolution also has the potential to capture regional weather patterns well. These models can be bespoke to a specific region allowing for more detailed representations of features such as vegetation patterns and urban landscapes. This makes them valuable tools for exploring how the climate will change in specific regions.

An RCM takes information from a GCM – variables like wind, pressure, temperature – as inputs to start their own simulations. They then produce detailed, high-resolution climate projections for a specific region as outputs (Figure 2). The use of the GCM for inputs means that it is a requirement to have these global models, and have them working really well, if RCM simulations are to be reliable.

Figure 2: Schematic representation of how an RCM works. An RCM takes information from a GCM as inputs to start their own simulations and produce regional climate projections as outputs.

Accelerating climate simulations

While GCMs are currently limited to providing information at a 50-250 km scale, decision-makers are often interested in knowing how impacts will be experienced at a local level. GCMs are therefore commonly ‘downscaled’ to a finer spatial scale by running an RCM. Due to their fine resolution, running an RCM requires a lot of computational power. This is where machine learning models known as RCM emulators can help.

An RCM emulator can mimic the behaviour of an RCM without having to perform all the mathematical calculations of an actual RCM, speeding up climate simulations and making them less energy-intensive. How is this done?

As mentioned in the previous section, an RCM takes input data from a GCM to produce its outputs (Figure 3a). Running an RCM therefore provides a large volume of inputs and outputs data. In the training phase, the RCM emulator analyses these data and learns the relationships between the GCM inputs and the RCM outputs. Once trained, the RCM emulator can quickly produce outputs based on new GCM input conditions, bypassing the mathematical calculations that an RCM would typically perform (Figure 3b).

RCMs emulators can be particularly useful when exploring various climate scenarios. Let’s imagine a team of researchers is interested in knowing the impact of different greenhouse gas emission scenarios on the climate in New South Wales (NSW) over the next 30 years. Running an RCM would be computationally intensive, as complex equations would have to be solved for each grid cell and for each greenhouse gas scenario over a 30-year period. In this case, using an RCM emulator, which approximates the complex behaviour of the RCM, would allow the researchers to bypass these complicated calculations and provide rapid insights into how NSW’s climate might change under different scenarios.

It is important to acknowledge that while RCM emulators can be valuable tools to speed up climate simulations, these models can only approximate the behaviour of RCMs and therefore might not capture all their complexities. This must be considered when running simulations with RCM emulators and interpreting their results.

Figure 3: a) An RCM typically takes GCM inputs and perform complicated mathematical calculations to produce its output, i.e regional climate projections. b) A trained RCM emulator can quickly produce outputs based on input conditions, without having to crank through all the mathematical calculations.

Improving the modelling of small-scales processes

Small-scale processes such as cloud formation, precipitation, radiation and vegetation changes can influence the climate and are therefore important to accurate climate simulations. However, because these processes often occur on a scale that is smaller than the grid of most climate models, they are not always captured well by the models.

To address this, small-scale processes can be approximated using simplified equations or rules known as parametrisations. For example, instead of calculating every detail of cloud formation, a model can use a parametrisation which dictates cloud formation based on the following rule: if humidity reaches 90% and the temperature drops below 0 degrees, then a cloud will form. This relationship is represented by a simplified formula in the model. While parameterizations are necessary tools, oversimplifying complex processes such as cloud formation can lead to inaccuracies in models.

This is where machine learning becomes useful. By analysing large volumes of data, machine learning models can identify complex relationships between climate variables and model small-scale processes with greater accuracy. Once trained, these models can be integrated into climate models to enhance or even replace parametrisations. Because some parametrisations can be computationally expensive, replacing them with machine learning models has the potential to accelerate climate simulations.

Figure 4: machine learning models can replace parametrisations.

Enhancing the modelling of multifaceted relationships

In addition to enhancing the representation of small-scale processes, machine learning can be a powerful tool to represent complex relationships between climate variables.

It is particularly useful in scenarios where modelling complex relationships using traditional mathematical equations is challenging or even infeasible due to the lack of a clear understanding of how various factors interact.

For example, predicting the energy demand of a region based on weather attributes is a complex task because it involves numerous factors (or predictors) like temperature, humidity, season, population density, and even the day of the week. The exact relationship between these predictors and the energy demand is not easily defined by a straightforward mathematical formula. However, machine learning can overcome this challenge by training on large amounts of past data on weather conditions and energy demand. By analysing patterns in this data, machine learning models can learn the underlying relationships, even if they are nonlinear or involve intricate interactions between variables. This allows for more accurate forecasting of energy demand, helping to optimise resource allocation and manage energy supply more effectively.

Providing early warnings of extreme weather events

Another application of machine learning lies in the prediction of extreme weather events.

Extreme weather events, such as floods, droughts, cyclones and fires, can be particularly destructive and cost billions of dollars in damage. These events have occurred in the past, providing us with valuable information. On the other hand, real-time data for ongoing extreme events can be provided by devices such as satellites, buoys, radars or weather stations.

By analysing data from the past and the present, machine learning models can help forecast near-future extreme weather events. During the training phase, machine learning models identify patterns in the data that are linked to an extreme weather event. For instance, it can recognise what weather conditions often led to a drought. Once the model is trained, it can look at current weather conditions and predict if a drought is likely to happen or not, based on what it has learned. This can potentially improve early warning systems and help communities better prepare for extreme weather events.

It is important to note that this approach may have limitations in predicting rare and very extreme events. This is because these events may not have occurred frequently in the past, making it hard for the machine learning to learn from them.

Figure 5: Machine learning can help in the prediction of extreme weather events. Source: Stock

Providing insights into the drivers of extreme weather events

In addition to helping in the prediction of extreme weather events, machine learning can also help to understand their main drivers.

This process involves training the machine learning model on historical data. By analysing the data, the model can, for example, identify the most important factor(s) contributing to drought events. Once trained, the model can identify these factors for any new drought event.

In a recent study, a team of researchers from the ARC Centre of Excellence for Climate Extremes used a machine learning model to understand the main drivers of the Tinderbox Drought, an exceptionally extreme event that occurred between 2017 and 2019 in Southeast Australia, and that eventually led to Australia’s Black Summer bushfires. The model revealed that El Niño – a climate driver usually associated with the occurrence of droughts in eastern Australia – was not the main factor contributing to the Tinderbox Drought. Instead, local climate features such as soil moisture and evaporation played a key role in driving the drought.

Challenges of machine learning in climate modelling

While using machine learning models in climate modelling presents several advantages, the approach also comes with several challenges.

Training machine learning models is usually performed on supercomputers, which requires a substantial amount of computational power. This can result in a high carbon footprint, especially if the electricity used comes from non-renewable energy sources. Moreover, to function well, supercomputers need to be cooled, and cooling systems often necessitate extensive amounts of water. These environmental impacts should be considered before using machine learning models.

Another significant challenge to machine learning is climate change. As the climate changes in response to greenhouse gas emissions emitted by human activity, future conditions may diverge significantly from the past, and the patterns that machine learning models have learnt during training might change. As a result, machine learning models trained on past data may not accurately predict future climate conditions.

Finally, machine learning may not be as effective in predicting extremes, as these are often underrepresented in the data used for training, making it hard for machine learning models to learn from them. So, while machine learning models could work well to predict common conditions, it might be less efficient at predicting more extreme events which are of greater concern.

Future directions

Machine learning models have the potential to offer significant advancements in the field of climate modelling. Their ability to find patterns in data make them valuable tools to speed up climate simulations and represent complex processes and relationships between climate variables. By learning from past and present data, these models also have the potential to provide early warnings of future extreme events such as droughts, fires, and floods.

The capabilities of machine learning models must not be overestimated, and the approach must be considered with caution. Ongoing research efforts are needed to address the several challenges associated with this approach and ensure it can contribute positively to climate modelling.