EUROCONTROL’s Performance Review Commission launched the 2024 PRC Data Challenge in July 2024 with the aim of engaging with data scientists and aviation enthusiasts for the development of an open model to estimate an aircraft’s take-off weight. The dataset for the challenge represents a unique instance of otherwise difficult-to-obtain flight information and could be reused for educational purposes or to further improve the outcome of the challenge.
True to its values of openness, transparency, and reproducibility, the EUROCONTROL Performance Review Commission (PRC), established in 1998 by EUROCONTROL’s Permanent Commission, provides objective information and independent advice to EUROCONTROL’s governing bodies on the performance of the European Air Traffic Management (ATM). The insights are provided based on extensive research, data analysis, and consultation with stakeholders. In 2023, PRC decided to promote a data challenge that could be of use to tackle the emerging issue of quantifying the impact of aviation on climate.
The PRC decided to focus the challenge on predicting the Actual Take-Off Weight (ATOW). ATOW is an essential input parameter for modeling the amount of fuel burnt during a flight and of gasseous emissions produced such as carbon dioxide (CO), nitrous oxides (NO), sulfur dioxide (SO) et al. Also important was the possibility to freely use the result of the challenge with openly available input data. The collaboration with OpenSky Network (OSN) and fellow researchers from TU Delft and ONERA made it possible to design the challenge and the companion data set that are described in the following sections.
During the design of the challenge, our initial hypothesis is that ATOW should depend on the following factors:
Parameters related to the origin and destination:
The geographical distance between the two airports of a flight influences how much fuel an aircraft will have to tank.
Aerodrome of Departure (ADEP) or Aerodrome of Destination (ADES) may dictate Air Traffic Management (ATM) procedures like Standard Instrument Departure Route (SID)1 and Standard Arrival Route (STAR)2 that influence the trajectory flown and hence the extra fuel required.
Both ADEP and ADES affect how an Aircraft Operator (AO) might plan and execute flights, for example, in selecting the potential airports for diversions, which can affect the decision on extra fuel to be carried on-board.
Information related to time:
Depending on the time of day or day of the week when flights are planned, the flights may experience longer taxi times or measures influencing the capacity, such as re-routing, holding, and vectoring, all of which would affect the fuel decision.
seasonal trends, such as the International Air Transport Association (IATA) season schedule3, local time, and flight duration, could also affect the weight of the flight.
Information on the aircraft (airframe): the International Civil Aviation Organization (ICAO) type4 will imply different aircraft performance profiles and hence different amounts of fuel needed
Airline: Policies vary for different airlines, which can affect the take-off weight. For the same city-pair, airlines could select a different alternate aerodrome to be used in case of diversion due to technical issues. Airlines could also have different fuel tanking policies.
Operational data: The actual flown route length, which is different from great circle distance, is caused by ATM constraints like regularly allocated military areas. This parameter could better refine ATOW estimation. A similar effect due to taxiway constraints also applies to the taxi-out operations.
The 4D trajectory itself: The Automatic Dependent Surveillance–Broadcast (ADS-B) trajectory data contains a lot of information that helps to classify the way a flight has been flown. For example, the rate of climb and maximum level of cruise flight are all dependent on the aircraft’s weight.
Based on the previous hypothesis and availability of the data sources, we constructed the dataset for the PRC Data Challenge. It consists of:
Actual Take-off Weight (ATOW) data: Flight information from EUROCONTROL’s Network Manager (NM) augmented with derived Take-Off Weight (TOW) from airlines. The airline information is anonymized. We have extracted a total of flights that were flown throughout Europe in 2022. This represents 6.1% of the flights from the EUROCONTROL airspace.
Trajectory data: State vector from the
OpenSky Network [Schäfer et al. 2014] for the
above flights, augmented with meteorological items from Copernicus
ERA5 [Hersbach, H. et
al.] via the fastmeto
library [Junzi Sun
2025].
Due to data disclosure constraints, we could not identify the airline operators or the airframe (ICAO transponder code or registration number). So these parameters are not included in the open dataset.
The flight list used in the data challenge is derived from EUROCONTROL data, containing scheduled and non-scheduled flights, where we removed flights such as military, general aviation, sensitive, and state flights. The resulting bare flight list accounted for around 8,686,000 flights in 2022.
We further removed:
Flights with the same origin and destination airport
Flights with unknown airport, where ADEP or ADES with value
ZZZZ
or Air Filed (AFIL)5
Flights without callsign
or ICAO transponder
address, which is required to match ADS-B trajectories
Flights with no complete weight data, such as missing fuel weight, or only having fuel weight
Flights from airlines that have not shared or agreed to share the take-off weight data
After filtering, 1,006,051 flights, containing take-off weight information, have been retained for the data challenge.
Based on this list of flights with take-off weight information, we extracted the relevant ADS-B trajectories from OpenSky’s historical data. The parameters used for extracting state vectors are:
icao24
callsign
date
(the date of Actual Off-Block Time
(AOBT))
start
(five minutes before AOBT)
stop
(thirty minutes after actual Arrival Time
(ARVT))
The data extraction provided 527,162 trajectories, with the relevant flight list, which became the final ground truth flight dataset for the challenge.
For the purpose of automatic ranking, we split the dataset into different training and testing sets, the proportions are shown in Figure 1. The split between training and testing is random. We evaluated the distribution of the aircraft types to ensure the consistency between training and testing datasets.
The difference between the datasets are:
Part A
: The training dataset,
train.csv
. It was named
challenge_set.csv
in the 2024 PRC Data Challenge. It
consists of
rows of state vectors. This dataset is the one from which to learn
and build the machine learning model: it contains the
tow
column with the ATOW values.
Part B
: The initial testing
dataset, test.csv
. It was named
submission_set.csv
in the 2024 PRC Data Challenge. It
consists of
rows. This dataset was used for submissions and ranking up to
around one week before the deadline. It was the one to submit with
a predicted value of ATOW in the tow
column, which
was not disclosed during the competition.
Part B
+ C
: the
final test dataset, test_final.csv
. It was named
final_submission_set.csv
in the 2024 PRC Data
Challenge. It consists of
rows. This dataset was used for the final ranking in the last
phase of the challenge. It added
rows to the test dataset, test.csv
.
After the end of the data challenge, we deliver the full ground
truth dataset in flight_list.csv
. It consists of all
the
rows, i.e. A
+ (B
+ C
)
inclusive of tow
values.
The parameter names, description and units are listed as follows:
Flight identifications:
flight_id
: unique flight ID generated using
traffic library
callsign
: obfuscated callsign of the
flight
Origin and destination airports:
adep
: departure airport ICAO code
ades
: arrival airport ICAO code
name_adep
: departure airport name
country_code_adep
: departure country
code
name_ades
: arrival airport name
country_code_ades
: arrival country
code
Date and time:
date
: date of flight (UTC)
actual_offblock_time
: Actual offblock time
(UTC)
arrival_time
: Arrival time (UTC)
Aircraft information:
aircraft_type
: ICAO aircraft typecode
wtc
: wake turbulence category, see footnote in
Table [tbl-aircraft-types]
Airline information:
airline
: obfuscated airline code
Operational parameters:
flight_duration
: flight duration (in
minutes)
taxiout_time
: taxi-out time (in
minutes)
flown_distance
: route length (in nautical
miles)
tow
: estimated take-off weight (in
kg)
In terms of ICAO aircraft types, there are 30 distinct ones in the dataset; the top 10 account for around of the total flights, see Figure 2 and Table [tbl-aircraft-types].
In terms of city-pairs, there are 2836 (undirected) city-pairs in the dataset. The top 132 cover 50% of the traffic, see Figure 3.
The dataset shows the typical seasonality of summer peak and winter trough but not for all aircraft types, see Figure 4.
The data set for the 2024 PRC Data Challenge is available
at
https://doi.org/10.4121/8cb8484b-dbe7-4750-8b87-a5b1dbc621b4
The overall size is around 286 GiB, mainly due to the trajectory files. The dataset is licensed under CC BY 4.0 license.
We are grateful for the support we received from the EUROCONTROL PRC and particularly the support from the Commissioner José Miguel de Pablo Guerrero.
If the paper has more than one author, the CRediT section must be included. See example usage on https://casrai.org/credit/
Enrico Spinielli: Conceptualization, Data, Writing- Original draft
Junzi Sun: Conceptualization, Data curation, Writing- Original draft
Martin Strohmeier: Conceptualization
Xavier Olive: Conceptualization
Quinten Goens: Conceptualization
Rainer Koelle: Conceptualization
Allan Tart: Conceptualization
John Fitzgerald: Conceptualization
The open dataset can be donwloaded from:
https://doi.org/10.4121/8cb8484b-dbe7-4750-8b87-a5b1dbc621b4
The source code of all the competition teams can be found
at:
https://github.com/PRC-Data-Challenge-2024/
A SID is a standard Air Traffic Service (ATS) route identified in an instrument departure procedure by which aircraft should proceed from the take-off phase to the en-route phase.↩︎
A STAR is a standard ATS route identified in an approach procedure by which aircraft should proceed from the en-route phase to an initial approach fix.↩︎
IATA Summer schedule for the year begins on the
last Sunday of March and ends on the last Saturday of October of
the same year.
IATA Winter schedule for the year begins on the Sunday after the
last Saturday of October and ends on the Saturday before the last
Sunday of March the next year.↩︎
and possibly the engine types and age, but these data points are not reliably or openly available and as such were not included in the Data for modeling dataset.↩︎
An AFIL is recorded by air traffic controllers and encodes a flight plan received from an aircraft already in flight.↩︎