Introduction to Extreme Value Theory for Climate Risk Analysis: Calculating a 100-Year Heatwave with Python

During the heatwave of July 2022, the air conditioning systems in the data centers of two London hospitals failed, leaving over 23,000 medical staff without access to appointment calendars, patient records, or test results. The culprit? The outside temperature surpassed the maximum level for which the cooling systems were designed.

Determining the maximum temperature that can be encountered in a location is more complex than it might seem.

In this tutorial, we will explore why this is the case and introduce extreme value the main method used today to assess the probability and severity of extreme events with limited data.

Analyzing Maximum Temperatures: a Naive Approach

At first glance, it seems that predicting the extremes one might encounter—whether it's storms, floods, or other weather events—is straightforward: you simply look at the most extreme events recorded in the past. For example, the maximum temperature at a specific location can be estimated by analyzing the highest temperatures recorded historically.

For this demonstration, we will use the maximum temperatures recorded in Orly, near Paris in France, over the last 30 years. This data is sourced from the Global Historical Climatology Network. To save time, we have prepared a dataset in CSV format that you can use to reproduce this tutorial.

Let's start by opening it and displaying the measurements:

Here is the result:

Maximum temperature recorded at Orly over the last 30 years

A first approach could be to perform statistical analysis on this dataset. Let's isolate the summer months and examine the last decile and the last percentile of temperatures:

The last temperature decile is 30.9°C, and the last percentile is 36.6°C. If tomorrow's climate is comparable to that of the last 30 years, we can expect the first threshold to be exceeded about ten days per year on average, and the second about once per summer on average.

While these are interesting indications, they are only useful if you can accept that the temperature limit will be exceeded regularly. If you are designing a critical system—such as a hospital air conditioning system or an industrial cooling system where a shutdown can cost millions of euros—these values are practically useless.

Elusive by Nature, Extreme Events are Challenging to Predict

Why not take a higher quantile? For example, the 99.99th percentile, which was reached only 0.01% of days over the period 1991–2020?

First, because this quantile will be calculated from only 2 or 3 days, depending on the method used. With such a small sample, there's no guarantee that the value obtained is representative of the local climate. The result may be entirely determined by a brief heat peak that has little chance of recurring, or your dataset may not contain examples of events corresponding to this probability.

The Orly series provides several examples of these limitations:

If we calculate the 99.99th percentile before 2003, we get 37.2°C. This value was largely exceeded during the 2003 heatwave, which reached a maximum of 40°C.
If we recalculate it before 2019, we get 39.4°C. This value was again exceeded during the 2019 heatwave, which set a record of 41.9°C.

Moreover, even if by chance this value were representative, over a 30-year series, the 99.99th percentile corresponds approximately to a ten-year event—one that occurs on average every 10 years or has a 10% probability of occurring each year. This probability of exceedance is far from negligible and remains too high for many applications.

Critical systems are typically designed for centennial events (1% chance of occuring in any year), millennial events (0.1% yearly probability), or even decamillennial events (0.01%), as seen in the nuclear industry.

Extreme events, by definition, are rare. That's the problem: it's virtually impossible to obtain a data series long enough to accurately assess them.

For example, if you want to determine a centennial event, even a century of observations is insufficient. The probability that this period contains exactly one centennial event is only 37.0%, only marginaly higher than the probability that it contains none (36.7%). There is also a 26.3% probability that the series contains two or more centennial events.

To calculate a centennial event with a reasonable level of confidence, we would need at least around 300 years of data. Without even mentioning the reliability of the measurements or the evolution of the climate over such a long period, the oldest series of daily temperatures starts in 1772—that is, 250 years ago.

To be able to calculate a centennial event with a correct level of confidence, we would need at least 300 years of data. Without even mentioning the reliability of the measurements or the evolution of the climate over such a long period, the oldest series of daily temperatures starts in 1772… that is 250 years ago.

What is Extreme Value Theory?

In reality, classical methods of statistics and probability are designed to study the probable, they are good for telling what will likely happen. But, we want to study extremely rare and unlikely events: it is the improbable that we wish to quantify.

This is where a specialized branch of statistics comes into play: Extreme Value Theory (EVT), also known as Extreme Value Analysis or EVA.

Developed in the mid-20th century, extreme value theory has been widely used in fields such as hydrology, engineering, and finance before finding new applications in climatology in the 2000s.

To summarize it in a few words, the general principle of EVT is to isolate the extreme values within a series of observations and use them to construct the tail of the probability distribution. This distribution is then used to calculate the probability of events that are too rare to be reliably represented in the initial sample.

Identification of extremes and theoretical distribution

Let's take our temperature series and start by identifying its extreme values. This can be done with two different approaches :

Block Maxima Method: We divide the sample into blocks (often one year each) and take the maximum value from each block.
Peak Over Threshold (POT) Method: We select values above a certain threshold, ensuring a minimum distance between two values to maintain their independence. For example, if the threshold is exceeded multiple times during the same heatwave, we only consider the highest value.

In both cases, it is necessary to ensure that the selected extremes correspond to homogeneous phenomena. For example, in climates that have two warm seasons, it will be necessary to study each season separately because the extremes may have different distributions in the two cases. Similarly, if we study wind speed, we will have to separate winter storms and hurricanes.

The block method is simpler to implement. We will use it with a one-year block:

We obtain the following distribution:

Block distribution of temperature extremes

According to the Fisher-Tippett-Gnedenko theorem, this normalized distribution must converge at the limits to the generalized extreme values (GEV) law. Other theoretical distributions are sometimes used, such as the Weibull, Gumbel or Fréchet laws, but these are only special cases of the GEV.

We will fit this theoretical GEV distribution to our extreme temperature data.

There are several methods to fit GEV and they can give slightly different results. For now, we will use the default method in SciPy which is the maximum likelihood method .

We get the following result:

Fitting the generalized extreme value distribution by scipy default method

Calculation of Extreme Events and Return Times

Next, let's represent the cumulative histogram of the extremes and the distribution function of the GEV:

Cumulative histogram of extremes and the GEV distribution function

This representation helps to better understand the interpretation of the distribution function: it indicates the annual probability that a certain temperature will not be reached. For example, the probability of not reaching 30°C is approximately zero, while the probability of not reaching 40°C is very high.

In other words, the average frequency of the event "temperature T is reached" is event is 1 — CDF(T), where CDF is the cumulative distribution function. Since the period is the inverse of the frequency, the return time of an event is 1 / (1 — CDF(T)).

As a result, we can represent the temperature reached as a function of the return time:

The resulting graph allows us to estimate the return time for different temperatures. For example, 40°C corresponds approximately to a return time of 20 years:

Maximum daily temperature at Orly - observations and GEV

We can now calculate precisely the temperature reached for different return times.

For this we will use gev.isf() which directly gives the inverse of the survival function, in other words 1/(1 — CDF).

We obtain the following results:

temperature exceeded with a 5-year return period: 37.7°C
temperature exceeded with a 10-year return period: 38.9°C
temperature exceeded with a 20-year return period: 40.0°C
temperature exceeded with a 50-year return period: 41.2°C
temperature exceeded with a 100-year return period: 42.1°C

A Word of Caution About EVT and Extreme Values Statistics

The method detailed above is widely used for assessing physical climate risks in industries such as insurance and development. However, the results must be interpreted with caution.

First, it is generally accepted that this method can evaluate extremes with an acceptable level of confidence up to about three times the size of the initial sample in a stationary climate. In other words, a 30-year sample is just enough to calculate a centennial event. Beyond this, the quality of the results becomes highly speculative.

Ideally, with enough data, one could use a longer historical period. However, this would invalidate the stationarity hypothesis. For lack of better options—since many statistical analyses require long data series—we assume that the climate is more or less stable over 30 years. This assumption is debatable: due to greenhouse gas emissions, the climate is changing rapidly, and even over 30 years, this change can outweigh natural variability. Over 50 or 100 years, the impact is even more pronounced.

Another limitation of these extrapolations is that calculating a centennial and predicting what will happen in the next century are two entirely different things.

A return time does not mean the event will occur exactly once per period; it may not occur at all or occur multiple times. It is better to interpret return times as an annual probability. For example, a return time of 10 years means an annual probability of 10%.

Furthermore, this calculation is based on the climate of the last 30 years, which we consider representative of the current climate for lack of better data. The longer in thue future we go, the less valid this hypothesis becomes. It is clear that the centennial heatwave of 2100, or even 2050, will differ significantly from the one calculated for 1991-2020. To evaluate extreme events in more than one or two decades, it is preferable to rely on future climate simulations rather than observed weather records.

Finally, the extrapolation is done using a statistical model, not a physical one. It can sometimes produce values that exceed physical limits, if they exist. Therefore, expert review remains necessary in all cases.

👋 And while we're on the subject: do you need an expert? With more than 230,000 diagnoses provided in 2024, Callendar is the reference in the development of accessible solutions for the assessment of climate risks at the local level.

Whether for an industrial asset or a portfolio with thousands of locations worldwide, for a one-off project or to set up tailor-made tools, to assess current risks or risks throughout the 21st century… We have the tools to help you. Contact us to discuss your project!