Fast contrail estimation with OpenSky data

Junzi Sun; Esther Roosenbrand;
This web version is auto-generated from the LaTeX source and may not include some elements. Check the PDF version for more details.

Abstract

Contrails, formed under specific atmospheric conditions, have a noteworthy role in heat-trapping within the atmosphere. This study bridges the gap between theoretical contrail formation models and real-world data by employing flight information from OpenSky and meteorological data from the European Centre for Medium-Range Weather Forecasts. We introduce a computationally efficient contrail estimation module, leveraging a client-server architecture that allows on-demand weather data interpolation via an API, significantly reducing computational load and enhancing performance locally. The study also benchmarks the entire pipeline, from data acquisition to contrail prediction, offering a robust tool for future air traffic studies requiring interpolated weather data.

Introduction

Meteorological data for flight-based analysis

Aviation research has long been intertwined with meteorological data. Trajectory-based research often relies on weather data provided by meteorological institutes to ascertain the wind and temperature conditions of the flight, both in the strategic and tactical phases, as well as in post-analyses. For studies reliant on ADS-B data, where only ground speed is presented, wind becomes a crucial source of information for estimating the flight’s true airspeed. This, in turn, facilitates a better approximation of aircraft performance. In an earlier study [Sun et al. 2022], the wind was found to be a significant factor in assessing aviation emissions.

Wind information is also pivotal for trajectory optimization studies. Given a specific aircraft and take-off mass, meteorological conditions are one of the primary factors causing variations in the actual optimal flight trajectory. In the study [Sun 2022], the effect of wind on optimal trajectory generation was examined. One of the challenges in applying wind data in large-scale optimization has been the slow process of obtaining and processing the wind data.

Meteorology Information for Contrail Formation

The formation and persistence of contrails are highly contingent on the prevailing meteorological conditions. Understanding these atmospheric parameters is important for the analysis of contrail formation and its subsequent ramifications. These meteorological variables include temperature, humidity, and atmospheric pressure at a given flight altitude.

The temperature and saturation levels in the surrounding atmosphere play a pivotal role in contrail formation. When aircraft engines expel hot exhaust gases into the cooler ambient atmosphere, contrails are likely to form if the atmospheric saturation is surpassed. The lower the temperature, the higher the propensity for contrail formation and persistence. Moreover, the relative humidity with respect to ice is a significant parameter [Schumann 1996], as contrails tend to persist in higher humidity conditions, evolving into contrail cirrus clouds under favorable conditions.

Atmospheric pressure, albeit to a lesser extent compared to temperature and humidity, also influences contrail formation. The pressure dynamics affect the rate at which exhaust gases from aircraft engines mix with the ambient atmosphere, thereby influencing the initial formation and subsequent dispersion or persistence of contrails. Additionally, wind conditions at flight altitude can impact the dispersion and longevity of contrails. Wind shear, for instance, can either elongate or dissipate contrails, affecting their visual and radiative properties.

Challenges of Integrating Meteorological Data

The primary challenge lies in the computational burden of processing large volumes of trajectory data against the four-dimensional meteorological grids, encompassing time, latitude, longitude, and altitude.

There are often two bottlenecks in utilizing the meteorological data. First and foremost, the request and download of ERA5 data from ECMWF can be time-consuming. For large datasets, the queueing for resources can take up to hours, with additional minutes required for downloading the data once it is ready. Given that only a single request can be run at a time, requesting ERA5 data for each trajectory with its bounding box is highly inefficient.

The second challenge pertains to the interpolation model. Constructing such a model necessitates the grid weather data (in GRIB format or netCDF format) to be fully loaded into the memory before the selection of a sub-region for the construction of the interpolation model. This task is often both I/O intensive and RAM intensive, posing significant hurdles to efficient data processing and analysis.

In this paper, we propose a new approach that addresses these two main challenges, significantly improving the speed of incorporating ERA5 weather data into aviation research. A new Python library, FastMeteo, is developed and shared as an open-source tool. We use contrail formation estimation as the primary use case to demonstrate the exceptional speed of obtaining interpolated ERA5 data for any given flight trajectory.

Data

Trajectory and Weather Data

This study leverages a representative sample of flight data obtained from the OpenSky Network [Schäfer et al. 2014], which boasts an extensive network of receivers scattered globally. The utilization of OpenSky data serves to illustrate the efficacy and expediency with which contrail estimation can be performed when dovetailed with a comprehensive meteorological dataset.

The requisite meteorological variables, namely temperature and relative humidity, are gleaned from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 dataset [Hersbach et al. 2018]. The ERA5 dataset has a high-resolution grid, sporting a resolution of 0.25 degrees horizontally. This granularity extends across 37 vertical layers, encompassing a pressure range of 1 hPa to 975 hPa, thereby providing a rich vein of atmospheric data crucial for nuanced contrail analysis. The reanalysis dataset is provided at each hour, providing accurate information for many aviation use cases.

In our use case of contrail analysis, only the ERA5 levels pertinent to en-route flights are considered, spanning between 100 hPa (approximately 53,000 ft) and 700 hPa (approximately 10,000 ft). The excluded levels either surpass the typical airliner ceiling or fall beneath the common altitude range for contrail formation.

Analysis-Ready, Cloud Optimized ERA5 Data

Crowdsourcing platforms like Pangeo Forge accelerate scientific pursuits by offering frameworks and cloud infrastructure to transform provider archives into Analysis-Ready, Cloud Optimized (ARCO) data stores, underscoring a community-driven approach towards generating analysis-ready, cloud-optimized data, including ERA5 data [Stern et al. 2022].

The Google ARCO-ERA5 dataset facilitates accessible global climate history in a cloud-optimized format. The open ARCO-ERA5 data, used in this paper, aims to make global climate history highly accessible in the cloud by the Google Cloud Public Datasets. This version of ERA5 has converted grib data to Zarr which is oriented towards common research and machine learning workflows.

Zarr, a format for chunked, compressed, N-dimensional arrays, presents several advantages over the commonly used GRIB format [Gowan et al. 2022], particularly in cloud-based or parallel computing settings. It enables efficient chunking for better data access, compression for storage efficiency, concurrent read/write access beneficial in multi-threaded environments, and structured metadata storage for enhanced data interpretation and discovery. Transitioning from GRIB to Zarr within the ARCO-ERA5 significantly improves the accessibility and speed for loading a large amount of ERA5, making it a prerequisite for our approach.

System architecture of FastMeteo tool

In this research, we propose the FastMeteo python library to handle the acquisition and interpolation of the meteorology data for any given flight or set of locations. This architecture allows for on-demand data acquisition of ERA5 data from the Google ARCO datastore. We employ a local caching strategy to enhance computational performance.

The structure of the proposed FastMeteo architecture

Figure 1 shows the architecture of the FastMeteo system. It can be used as an imported common Python package if the data is stored on the local computer. It can also be deployed in a server-client model, where the ARCO ERA5 data is cached on a server, and the interpolation of weather data can be performed through API using only the client Python module.

There are several key processes involved in the design of this architecture, which are explained as follows:

  1. Firstly, a request can be made either from an API (python function) or a client. The data frame is sent to the FastMeteo service, which is either hosted locally or on a remote server.

  2. The FastMeteo server code checks whether the hours, based on timestamps from the data frame, are already downloaded in the local cache. If not, it will sync the hour of data with the Google ARCO-ERA5 public data store.

  3. Based on the latitude and longitude boundary of the flight, we only load the region of data that is associated with the requested data frame. Since the data is stored in Zarr format, where data are stored in chunks according to the four-dimensional indices (time, latitude, longitude, and altitude), the extraction and loading of data are fast.

  4. Next, the linear interpolation of relevant meteorological features is computed based using the xarray. This process is vectorized and also provides a signification improvement in computation efficiency.

  5. Finally, a new data frame, containing the original information and new columns with interpolated meteorological information is returned or sent to the client.

The use of the library is extremely simple. Once the library is installed, it takes only two lines of code to get the weather information for a given flight or position. In the following code snippet (Figure 2), we first define the basic information of time, latitude, longitude, and altitude, which are required for the input. After that, the two lines of the code define the location for the local Zarr cache and then obtain the weather information through the interpolate function.

import pandas as pd
from fastmeteo import Grid

flight = pd.DataFrame(
    {
        "timestamp": ["2021-10-12T01:10:00", "2021-10-12T01:20:00"],
        "latitude": [40.3, 42.5],
        "longitude": [4.2, 6.6],
        "altitude": [25_000, 30_000],
    }
)

# define the location for local store
fmg = Grid(local_store="/tmp/era5-zarr") 

# obtain weather information
flight_new = fmg.interpolate(flight)
Example code using FastMeteo library to obtain meteorology data and provide interpolation of relevant parameters for the given flight

For example, the following snippet shows the input dataframe:

timestamp,icao24,latitude,longitude,altitude
2021-03-07 15:38:40+00:00,a56661,41.73921,-87.96647,10050.0
2021-03-07 15:38:45+00:00,a56661,41.7328,-87.97053,10275.0
2021-03-07 15:38:50+00:00,a56661,41.72649,-87.97452,10475.0
2021-03-07 15:38:55+00:00,a56661,41.72021,-87.97845,10675.0
2021-03-07 15:39:00+00:00,a56661,41.71364,-87.9826,10900.0
2021-03-07 15:39:05+00:00,a56661,41.70808,-87.98619,11125.0
...

while the new data frame provide by FastMeteo contains the following data:

timestamp,icao24,latitude,longitude,altitude,u_component_of_wind,v_component_of_wind,temperature,specific_humidity
2021-03-07 15:38:40+00:00,a56661,41.73921,-87.96647,10050.0,4.93797,-7.83555,266.78062,0.00019
2021-03-07 15:38:45+00:00,a56661,41.7328,-87.97053,10275.0,4.90272,-8.05699,266.43374,0.00018
2021-03-07 15:38:50+00:00,a56661,41.72649,-87.97452,10475.0,4.87196,-8.2515,266.12855,0.00018
2021-03-07 15:38:55+00:00,a56661,41.72021,-87.97845,10675.0,4.84209,-8.44564,265.82602,0.00017
2021-03-07 15:39:00+00:00,a56661,41.71364,-87.9826,10900.0,4.80977,-8.66434,265.48861,0.00017
2021-03-07 15:39:05+00:00,a56661,41.70808,-87.98619,11125.0,4.77882,-8.88355,265.15389,0.00016
...

When running the tool in a server-client mode. The following script can be used to start a FastAPI service on the server, which handles the flight date request, obtaining Google ARCO data if the partition is not on the server, perform the interpolation of weather data, and return the final data to the client.

fastmeteo-serve --local-store /tmp/era5-zarr

On the client side, the following code (Figure 3) can be used to submit and get the process flight with meteorology data, given that the flight data structure similar to previous Figure 2 is defined.

from fastmeteo import Client

# define the input data
#...

# create the client object
client = Client()

# send the flight and receive the new DataFrame
flight_new = client.submit_flight(flight)
Example code using FastMeteo library in a client-server mode

Contrail Formation Model

In this section, we explain how ERA5 parameters, including temperature and humidity, can be used for the evaluation of contrail formation based on ADS-B flight data. The models used for determining contrail formation and persistence are based on existing literature [Schumann 1996].

Schmidt-Appleman Criterion

Contrails primarily form under air temperatures lower than -40 °C (233 K) and high relative humidity [Schumann 1996]. Their formation is dictated by the Schmidt-Appleman Criterion [Schumann 2005], a thermodynamic model accounting for ambient pressure, humidity, and the water-to-heat ratio in exhaust plumes.

When an aircraft traverses atmospheric conditions that satisfy the Schmidt-Appleman Criterion, saturation with respect to liquid water occurs, leading to contrail formation. The specific temperature threshold for contrail formation depends on the ambient relative humidity and the slope of the isobaric mixing line (G), defined as follows:

\[\label{eq:1} G = \frac{EI_{H_2O}~c_p~p}{\varepsilon~Q~(1-\eta)}\]

where constants are defined in Table 1. The isobaric mixing line, G, is influenced by the ambient pressure (ph) at the flight level and the overall propulsion efficiency (), which is assumed to be 0.4 according to literature [Dischl et al. 2022],

The threshold temperature (TLC) is the point where the saturation vapor pressure with respect to ice intersects with the mixing line. The saturation vapor pressure over water is depicted in blue in Figure 4 and is described by Murphy and Koop [Murphy and Koop 2005]:

\[\begin{split} \ln(e_{liq}) &= 54.842763 - \frac{6763.22}{T} - 4.210 \ln(T) + 0.000367T \\ &+ \tanh\Bigr[0.0415(T - 218.8)\Bigr] \Bigr[ 53.878 - \frac{1331.22}{T} - 9.44523 \ln(T) + 0.014025T \Bigr] \end{split}\]

where T represents the temperature, and satisfies \(123 K < T < 332 K\). The saturation vapor pressure over ice, shown in orange in Figure 4, is given by:

\[\begin{split} \log e_{ice}= 9.550426 - \frac{5723.265}{T} + 3.53068 \ln(T) - 0.00728332 T \end{split}\]

where the temperature (\(T\)) is higher than 110 K. Before determining the threshold value TLC, the temperature TLM must first be identified, which is given by:

\[\begin{aligned} T_{LM} = -46.46 + 9.43 \ln (G - 0.053) + 0.720 [\ln(G - 0.053)]^2 \end{aligned}\]

Subsequently, the mixing line is defined as the tangent line at the TLM point with the saturation water vapor pressure curve (e*). TLC represents the intersection of the mixing line and the saturation water vapor pressure curve over ice.

Constants used in Schmidt-Appleman Criterion
Symbol Constant Value Unit
EI\(_\mathrm{H_2O}\) Emission index 1.2232 \(- (kg_\mathsf{H_2O} / kg_\mathsf{fuel})\)
\(c_p\) Specific heat capacity air 1004 \(J~kg^{-1}~K^{-1}\)
\(\varepsilon\) Ratio molar mass of water vapour and air 0.622 -
Q Specific combustion heat \(43 \times 10^6\) \(J~kg^{-1}\)
\(p_{st}\) Pressure at steam point 101 325 Pa
\(p_{ice}\) Pressure at ice point 611.73 Pa
\(R_v\) Gas constant of water vapor 461.51 \(J~kg^{-1}~K^{-1}\)
\(R_d\) Gas constant of dry air 287.05 \(J~kg^{-1}~K^{-1}\)

Ice Supersaturation

Persistent contrail formation is closely tied to the atmospheric conditions in the ice supersaturation (ISS) regions. Upon exiting aircraft engines, the hot and humid exhaust gases mix with the colder ambient air, leading to condensation and ice crystal formation if the ambient air is supersaturated with respect to ice. Ice supersaturation occurs when the relative humidity with respect to ice (RHI) exceeds 100%, allowing for the possibility of ice crystal formation and persistence under suitable conditions. Ice supersaturation can endure for extended periods given the lack of efficient ice nucleating particles. This phenomenon is critical in understanding the life cycle of persistent contrails.

In this paper, we use the following equation to calculate RHI:

\[\mathrm{RH}_i = \frac{e}{e_{ice}} = \frac{q~p~R_v}{R_d~e_{ice}}\]

where \(e\) is the actual vapor pressure of water. \(q\) and \(p\) are specific humidity and pressure, which are provided by the ERA5 data. \(R_v\) and \(R_d\) are gas constants for water vapor and dry air (see Table 1).

Finally, the geometric approach of the Schmidt-Appleman Criterion to identifying TLC is visualized in Figure 4.

Contrail forming conditions

Results

Example trajectory

To showcase the capabilities of FastMeteo, we use an example flight trajectory sourced from OpenSky, depicted in Figure 5. On the left side of the figure, fundamental flight parameters from ADS-B are displayed, encompassing position, altitude, and ground speed. The right side of the figure presents the meteorological data we acquired, such as wind, temperature, and humidity.

image image image image

An example flight trajectory. On the left: the trajectory obtained from ADS-B data. On the right: enhanced trajectory data with meteorological information.

Based on the additional temperature and humidity information obtained with FastMeteo, the determination of persistent contrails can be performed using the previously mentioned Schmidt-Appleman Criterion and ice supersaturation conditions. Consequently, segments of the trajectory that satisfy these conditions can be visualized alongside the original trajectory. This is depicted in Figure 6.

In the first plot, the segment of the trajectory satisfying the Schmidt-Appleman Criterion is highlighted in light purple. This segment encompasses the majority of the cruise flight due to the relatively cold temperatures. In the second plot, trajectory segments passing through ice supersaturated regions are marked in light blue. This occurs at the beginning of the trajectory, including portions of the climb and cruise. In the third plot, trajectory segments with estimated persistent contrails are marked in red, representing the overlap of the first two conditions.

Persistent contrails (in red) determined from the example flight trajectory, based on the Schmidt-Appleman and ice supersaturation criteria.

Computational benchmarks

In this section, we provide a quick analysis of the computational speed of the FastMeteo library. For this test, we downloaded one day’s worth of flights that departed from EHAM from the OpenSky network, encompassing around 850 flights. To test the influence of the number of data points, we resampled the dataset at varying rates, ranging from 5 seconds to 600 seconds. This yielded a range of data points from approximately 860,000 to 8,000 for these scenarios. Table 2 displays the computational speeds for these different tests. Observations show that when operating on large geographic coverage, the number of data points doesn’t affect performance significantly. For this dataset, computing the meteorological conditions for all 860,000 data points took less than eight seconds.

Benchmarking the computational performance FastMeteo, with one data of all flights departed from EHAM, which covers a large geographic region.
sample time data points compute time
5 s 861,276 7.47 s
10 s 430,991 6.43 s
20 s 215,920 5.92 s
30 s 144,082 5.76 s
60 s 72,462 5.72 s
120 s 36,664 5.45 s
180 s 24,723 5.47 s
300 s 15,152 5.57 s
600 s 7,977 5.15 s

We also tested the library with individual flights in the dataset to show its performance when applied to single flights. The results are displayed in Figure 7. Most flights are processed in under 1 second.

The relationship between computation time and flight distance. The colors indicate the varying density of the data. We can see that most of the flight in this sample is around 500 nautical miles, with a computation time of around 0.8 seconds.

Based on these two tests, we can determine that the performance is not directly proportional to the number of data points processed by the FastMeteo library. It is more efficient to process a large number of data points simultaneously because the interpolation of the multi-dimensional grid is vectorized, utilizing the underlying xarray library’s interpolation functions.

Discussion

This research makes use of the Google ARCO-ERA5 public dataset, which is converted from the ECMWF ERA5 data. This implies that the data quality and errors remain consistent with the original ERA5 data. It is noteworthy that not all parameters from the ERA5 data are included in the ARCO dataset. However, for contrail analysis, critical information on temperature and specific humidity is available. Other commonly used meteorological data such as wind is also included, facilitating diverse aviation applications.

In addition to the standard grid presented in the original ERA5 dataset, ARCO also offers a reduced Gaussian grid [Hortal and Simmons 1991] for its parameters. This could potentially slightly decrease the storage requirements and enhance computation speed. However, due to the complexity of the conversion process, we opt for the standard grid representation. This could be a future enhancement if such a feature becomes necessary for the aviation research community.

The efficiency of FastMeteo is primarily attributed to the innovative Zarr storage. It enables grid data to be stored and retrieved efficiently, even on basic personal computers. It obviates the need to load the entire dataset to extract specific data. This method is also more effective than relational or non-relational databases, as Zarr is specifically optimized for multi-dimensional data structures.

The performance bottleneck is the synchronization of local data from Google Cloud Storage. With a high-speed connection, it can take approximately 10 seconds to synchronize one hour of global data from Google ARCO-ERA5. However, once the data is available locally, subsequent interpolations are rapid.

In this paper, we focus on one use case of estimating contrails from flight data using FastMeteo. The application of this library can be broadened to additional aviation scenarios. For instance, with more precise wind data, we can more accurately estimate fuel consumption and emissions from flights by using a velocity closer to the true airspeed. We are also developing new programming interfaces to access a field of meteorological parameters, aiding the trajectory optimization applications.

Conclusion

In this study, we try to tackle the computational complexities associated with the incorporation of meteorological data into aviation studies. Two challenging tasks are identified: the process of downloading the vast ERA5 datasets from ECMWF and the computational demands associated with data interpolation. Addressing these issues, the FastMeteo Python library is introduced as an innovative solution. This tool, which can be seamlessly integrated into current research workflows, simplifies the acquisition and interpolation of meteorological data, offering remarkable computational efficiency.

The FastMeteo library can function either as a standalone Python library or as a client-server module. This adaptability ensures that users can tailor their data processing based on the dataset size and computational infrastructure available.

The OpenSky flight data is used to examine the tool’s capability in contrail estimation, demonstrating the system’s proficiency in fetching and assimilating complex meteorological data. The transition to ARCO-ERA5 datasets stored in the Zarr format on cloud platforms significantly optimized the access and utilization of ERA5 data.

Our benchmarks also reveal the computational efficiency of FastMeteo. The tool exhibits non-linear performance scaling concerning the number of data points, with a significant increase in efficiencies when processing larger datasets. This is attributed to the vectorized interpolation capabilities embedded in the underlying xarray library.

  • Junzi Sun: Conceptualization, Methodology, Writing–Original draft, Data curation, Writing–Original draft, Writing – Review and Editing, Visualization

  • Esther Roosenbrand: Conceptualization, Methodology, Writing–Original draft, Writing – Review and Editing, Visualization, Software

This study received funds from the TU Delft Climate Action Seed Fund 2022.

Dischl, R., Kaufmann, S., and Voigt, C. 2022. Regional and seasonal dependence of the potential contrail cover and the potential contrail cirrus cover over europe. Aerospace 9, 9, 485.
Gowan, T.A., Horel, J.D., Jacques, A.A., and Kovac, A. 2022. Using cloud computing to analyze model output archived in zarr format. Journal of Atmospheric and Oceanic Technology 39, 4, 449–462.
Hersbach, H., Bell, B., Berrisford, P., et al. 2018. ERA5 hourly data on pressure levels from 1940 to present. Copernicus climate change service (C3S) climate data store (CDS).
Hortal, M. and Simmons, A. 1991. Use of reduced gaussian grids in spectral models. Monthly Weather Review 119, 4, 1057–1074.
Murphy, D.M. and Koop, T. 2005. Review of the vapour pressures of ice and supercooled water for atmospheric applications. Quarterly Journal of the Royal Meteorological Society: A journal of the atmospheric sciences, applied meteorology and physical oceanography 131, 608, 1539–1565.
Schäfer, M., Strohmeier, M., Lenders, V., Martinovic, I., and Wilhelm, M. 2014. Bringing up OpenSky: A large-scale ADS-b sensor network for research. IPSN-14 proceedings of the 13th international symposium on information processing in sensor networks, IEEE, 83–94.
Schumann, U. 1996. On conditions for contrail formation from aircraft exhausts. Meteorologische Zeitschrift 5, 4–23.
Schumann, U. 2005. Formation, properties and climatic effects of contrails. Comptes Rendus Physique 6, 4-5, 549–565.
Stern, C., Abernathey, R., Hamman, J., et al. 2022. Pangeo forge: Crowdsourcing analysis-ready, cloud optimized data production. Frontiers in Climate 3, 782909.
Sun, J. 2022. OpenAP. Top: Open flight trajectory optimization for air transport and sustainability research. Aerospace 9, 7, 383.
Sun, J., Basora, L., Olive, X., et al. 2022. OpenSky report 2022: Evaluating aviation emissions using crowdsourced open flight data. Proceedings of the 40th digital avionics systems conference.