This paper introduces a gradient-based Smart
Predict-then-Optimize (SPO) framework to solve the aircraft
arrival scheduling problem (ASP) in the terminal maneuver area.
Traditional approaches to ASP typically separate arrival time
prediction from scheduling optimization, potentially leading to
incomplete solutions. We address this limitation by developing
an end-to-end learning framework that directly integrates
prediction with optimization objectives. Our methodology
introduces the concept of traffic instances for simultaneous
prediction of multiple aircraft arrival times, coupled with a
Mixed Integer Programming (MIP) model for scheduling
optimization. We evaluated our approach using real-world data
from London Gatwick Airport, analyzing arrival flights from June
to September 2024, organized into traffic instances. The
framework incorporates comprehensive weather data through the
ATMAP algorithm, considering factors such as wind, visibility,
precipitation, and dangerous phenomena. Experimental results
demonstrate that the MLP+SPO+ framework shows particular
effectiveness in adapting to adverse weather conditions,
strategically balancing transit times with operational
efficiency. While the minimum time interval is
required, the MLP+SPO+ will reach around
Aircraft Arrival Scheduling Problem (ASP) is a crucial challenge in the field of Air Traffic Management (ATM). As global air traffic continues to grow, optimizing the sequence and schedule in which/when aircraft land at airports within Terminal Maneuvering Area (TMA) has become foremost. Efficient arrival scheduling not only reduces fuel consumption and carbon emissions but also significantly improves overall air traffic flow, making it a key focus for both researchers and practitioners in the field. The ASP, classified as an NP-hard problem, has spurred the development of various approaches to tackle its complexity. Traditional methods like First Come First Serve (FCFS) have laid the groundwork, while advanced techniques such as the Trombone [Sprong et al. 2005; Sáez et al. 2020] and Point Merge System (PMS) [Boursier et al. 2007] leverage geometric principles to further enhance efficiency. These innovations underscore the ongoing importance of solving the ASP to maintain safety, minimize delays, and optimize airport operations in increasingly congested airspace.
Addressing ASP has changed significantly in recent years as a
result of increasing access to aeronautical data and rapid
advances in machine learning (ML). Researchers have successfully
applied diverse ML techniques to predict Estimated Time of Arrival
(ETA) and arrival transit times with unprecedented accuracy. These
advanced prediction models have not only enhanced our
understanding of arrival patterns and potential delays but have
also opened up new avenues for optimization. However, a
significant gap remains in the field: while ETA prediction has
seen substantial progress, the integration of these ML-driven
predictions into optimization algorithms for ASP has been
relatively unexplored, particularly in terms of optimization
performance. Traditional two-stage approaches focus on minimizing
prediction errors of certain parameters, typically using metrics
such as Mean Square Error (MSE) (
This study aims to address these limitations by applying the smart predict-then-optimize (SPO) framework to the ASP within TMA. This approach is particularly relevant for the ASP because, even with fixed Standard Terminal Arrival Routes (STARs) and observable weather conditions, aircraft arrival transit times within TMAs can vary significantly due to unexpected factors that may influence decision errors during the landing process. Our work pioneers the application of the gradient-based SPO framework in the air transportation domain. Furthermore, we apply this framework to address a critical challenge in ASP: the incorporation of adverse weather conditions consideration.
The structure of this paper is as follow: 2 constructs a literature review for related works, and 3 introduces our methodologies. In 4, we briefly introduce our case study at London Gatwick airport and the setup of our experiment. 5 presents the results and discussion while 6 concludes this work.
Arrival scheduling is a critical factor in ensuring efficient operations within terminal maneuvering areas (TMAs). A central challenge involves assigning landing times to aircraft while adhering to separation criteria between successive arrivals.
Prior studies frame this as an aircraft landing scheduling problem (ASP), where each aircraft must land within a predetermined time window bounded by an earliest and latest time [Beasley et al. 2000]. These temporal constraints reflect operational realities:
The earliest landing time represents the soonest achievable arrival under ideal conditions (e.g., maximum permissible speed, direct routing), while
The latest landing time accounts for delay absorption capabilities via speed adjustments, path stretching, or holding patterns, constrained by fuel limits and airspace procedures.
This time window ensures efficient airspace utilization while accommodating uncertainties such as weather or traffic conflicts. Solutions aim to minimize deviations from target times and maintain safe separation, often derived from wake vortex categories or air traffic control (ATC) regulations. While early ASP formulations focused on single-runway allocation [Beasley et al. 2000], extensions to multi-runway systems have become increasingly relevant for high-density airports.
There are different approaches to solve this problem in the literature. Some studies focused on exact algorithms and optimization models [Beasley et al. 2000; Pohl et al. 2021] while some others utilized heuristic and meta-heuristic algorithms to take advantage of reducing solving period [Beasley et al. 2001; Sama et al. 2015; Xu 2017; Prakash et al. 2018]. One study was focused on forming an heuristic algorithm to increase scheduling efficiency of arrival aircraft at London Heathrow. The algorithm showed that it could have the potential to increase the efficiency of the decisions made by air traffic controllers [Beasley et al. 2001]. In order to reduce the workload of air traffic controllers and congestion in airports, a metaheuristic algorithm was applied to a good initial solution to take advantage of its short computing time and the study was carried out in two Italian airports [Sama et al. 2015]. The use of an Ant Colony algorithm was investigated to focus on the aircraft scheduling problem. The algorithm was based on wake vortex modeling and findings are compared to some methods. This study showed that the algorithm based on wake vortex modeling revealed better results than models such as CPLEX, general ant colony algorithms, and approximation algorithm[Xu 2017]. A data splitting algorithm was used to solve the aircraft sequencing problem. The model, 0-1 mixed integer programming, was employed with many different realistic constraints. The algorithm had small run times enabling a real-time deployment of the concept[Prakash et al. 2018]. For more details concerning the aircraft scheduling problem, we refer two review studies on this topic [Messaoud 2021; Ikli et al. 2021].
In recent years, the landscape of arrival management research has been transformed by the increasing availability of aviation data, leading to a surge in ML-based approaches for arrival time prediction. The effort that has been spent on predicting arrivals flight time and its contribution to different ATM solutions are important to have more predictable, efficient and greener operations in TMAs [Zhang et al. 2022]. ML has an important role on reaching the goals contributing to providing better air traffic management. In the existing literature, there are different application of its algorithms focusing on Estimated Time of Arrival (ETA) / arrival flight time [Glina et al. 2012; Kern et al. 2015; Ayhan et al. 2018; Takacs 2014; Ma et al. 2022; Silvestre et al. 2024; Lui et al. 2025].
Quantile Regression Forests [Glina et al. 2012], a tree-based ensemble method, was employed for estimation of landing times. A total of 4011 cases were separated 67% and 33% for training and testing respectively. As stated in the research, the model was suitable to predict landing times in real-time applications. Random Forest (RF) [Kern et al. 2015], a well-known tree-based method, was utilized to improve prediction on ETA. In the application, feature generation and selection was one of the main focus points. As a result of this study, they showed that 78% of total instances have better accuracy within the ML algorithm against Enhanced Traffic Management System in US. Some regression models (Linear, Non-linear and Ensemble) and Recurrent Neural Network [Ayhan et al. 2018] were tested to perform prediction of ETA for commercial flights by comparing their model results with EUROCONTROL ETA predictions. One of the main outlines of this study was higher accuracy with smaller standard deviation which made smaller prediction windows of ETA possible. Spatiotemporal Neural Network Model for ETA [Ma et al. 2022] was proposed with three main stages that were trajectory pattern recognition, trajectory prediction and arrival time prediction. At the conclusion of their research, one of the findings was that the MAE was typically lower with shorter travel times to the destination. A deep learning approach based on Long-Short Term Memory [Silvestre et al. 2024] was used to predict ETA by utilizing 4D trajectory of the aircraft and weather data. In addition to the model’s result, this research came to the front with its application airport, Madrid Barajas-Adolfo Suárez (Spain). The performed model was superior to RF, Gradient Boosting Machines (GBM) and Adaptive Boosting that were selected as baseline in the study. Ridge Regression (RR) and GBM [Takacs 2014] were selected to predict runway and gate arrival time of flights, based on historical, weather, air traffic control and given data during the data science contest named as GE Flight Quest.
Despite these significant advances in both optimization and prediction domains, several gaps remain in the current literature. Because most researchers handle these problems separately, there exists a disconnect between arrival time prediction and scheduling optimization. While both areas have seen remarkable progress independently, the potential benefits of integrating prediction capabilities into optimization frameworks remain largely unexplored. Few studies have explored this area, but they mostly used the predicted values directly for the downstream optimization [Du et al. 2023; Pang et al. 2024]. The relationship between prediction accuracy and operational efficiency improvements needs more thorough investigation. Traditional methods also often fail to capture the dynamic nature of the airport environment, where predictions and scheduling decisions need to be made and updated continuously in response to changing conditions.
Recent developments in computational frameworks offer promising directions for addressing these limitations. SPO framework [Elmachtoub and Grigas 2022] provide a structured approach to integrating prediction and optimization, potentially offering a more coherent solution to the arrival scheduling problem. Similarly, learning-to-optimize techniques [Li and Malik 2016], which directly learn optimization strategies from data, may offer more robust solutions than traditional two-stage approaches. However, while these frameworks show theoretical promise, their practical application in aviation context remains limited. Key challenges include adapting these frameworks to handle the specific constraints and objectives of airport operations and validating their performance under real-world conditions and operational constraints. Given these challenges and opportunities in the existing literature, this research proposes the SPO framework for ASP inside the TMAs. The following section details our proposed approach and its implementation.
1 presents the general schematic
diagram of our proposed method. Starting from the raw flight data,
we generate an input dataset
Based on the dataset
The core function of this framework is the gradient computation
and the parameter updates through the backpropagation. For each
training instance, the gradient
In this work, we formulate the ASP as a simple Mixed Integer Programming (MIP) model based on the classical single runway aircraft landing problem proposed by [Beasley et al. 2000]. We assume:
The decision variables in our models are:
The objective of this model is to minimize the sum of costs for
all delayed aircraft, where:
The model formulation is listed as follows:
Our ASP seeks to minimize delay-related costs. At its core, the
mathematical formulation employs a simple objective function that
sums the costs across all delayed aircraft. Three decision
variables drive the model: continuous variables
Constraint [cons::E] ensures that each aircraft
When aircraft
The constraint becomes:
When aircraft
The constraint becomes:
Meanwhile, the complementary constraint
This enforces the minimum separation time
Thus, the pair of constraints ensures proper separation
regardless of landing order, with
The conventional delayed cost definition is
Traditional approaches to ETA prediction focus on individual
flight independently. For each flight
Flight sequence
[algo:instance] constructs
strictly non-overlapping traffic instances from temporally ordered
flights using a hybrid windowing strategy. For each candidate
group of
Based on the traffic instances, we can perform prediction task
via ML. The prediction model in this framework has to be
differentiable, we here proposed two simple model as our baseline,
including Linear Regression (LR:
As mentioned in 3.1,
the output is the predicted transit times
The decision loss in our framework is based on the SPO loss
introduced by [Elmachtoub and Grigas
2022]. This loss measures how well our predicted costs lead
to optimal decisions compared to decisions made with true costs.
The rigorous unambiguous SPO loss is defined as:
The max operator accounts for multiple optimal solutions
that could arise from
However, numerical studies in [Tang and Khalil 2024] demonstrate
that this rigorous form yields similar results to a simplified
version known as “regret”:
This measures the gap between the true cost of decisions made
by predicted costs
Since the SPO is intractable, Elmachtoub and Grigas [Elmachtoub and
Grigas 2022] derived a surrogate convex upper bound for SPO
called SPO+:
The computation of SPO+ involves solving a modified
optimization problem with costs (
In this paper, we construct our study in London Gatwick Airport. London Gatwick Airport (ICAO: EGKK) serves as a major international aviation hub in the United Kingdom. Operating with a single runway system—unique among airports of its size and traffic volume—Gatwick stands as London’s second-busiest airport and the second-largest single-runway airport globally, located approximately 29.5 miles south of Central London. In 2024 until October, it already handled traffic including both arrivals and departures1.
of arrival flights (ADS-B data) at EGKK from June 2024 to
September 2024 obtained from OpenSky Network
[Schäfer et al.
2014] are used in this study. For the local weather
information, we refer to the Meteorological Terminal Aviation
Routine Weather Report (METAR) of EGKK in 20242. METAR is a weather report
which contains the information for an area enclosed within a
2 illustrates sample flights in the scope of our study, capturing the terminal maneuvering area where arriving aircraft perform final approach sequences. The flight trajectories used in this study align with Gatwick Airport’s approach procedures. 3 presents the weather score distribution of EGKK in 2024. As the figure illustrates, wind components are the most significant weather events in EGKK, consistently showing the highest scores throughout the observed period. The wind scores frequently reach values 2.5 on the weather score scale. Precipitation issues also contribute to the overall weather conditions but to a lesser extent. Freeze conditions are more frequent in winter period but less important during summer season. Visibility appears to be relatively minimal, showing lower scores and frequency compared to other weather components. Dangerous phenomena are occasionally recorded but remain relatively rare events in the dataset.
1 summarizes the key parameters and configurations of our experimental setup. The study encompasses traffic instances from arrivals, with each instance involving 15 aircraft within a 45-minute time interval. The area of interest is confined to a 50 Nautical mile radius around EGKK, providing comprehensive coverage of the TMA.
Period | June - September 2024 |
---|---|
Number of aircraft | |
Number of traffic instances | |
Number of aircraft per instances | 15 |
Maximum time interval per instance | 45 minutes |
Area of interest | 50 Nautical miles around EGKK |
Machine learning models | {Linear regression; Multi-layer perceptron} |
Typical scenarios | { |
Input features | {Latitude, longitude, velocity, heading angle, vertical rate} at entry state |
Output feature | Transit time |
Loss function | SPO+, Mean Square Error (Two-stage approach) |
As mentioned in 3.2, we implement two ML
approaches for our analysis: LR and MLP. For the ASP, the model
parameters need to be pre-set and static during the training
process, we select typical scenarios from the instances to define
the parameters, characterized by maximum weather parameters
including wind (OpenSky Network
[Schäfer et al.
2014]. The required separation time is then determined
based on the WTC of the preceding and succeeding aircraft4.
The input feature space comprises five key aircraft parameters
at the entry state: latitude, longitude, velocity, heading angle,
and vertical rate. These parameters capture the essential initial
conditions of each aircraft’s trajectory. The models are trained
to predict the transit time as the output feature. For model
optimization, we employ two distinct loss functions: SPO+, and
Mean Square Error (MSE) in a two-stage approach. The ratio between
training sets and test sets are
In this section, we will present the results and corresponding discussions. First, 5 illustrates the learning curves for both loss functions on the training sets using normalized loss values. The SPO+ and two-stage approaches exhibit distinctly different convergence behaviors during training. The SPO+ loss curves show rapid initial decrease and stabilize at very low normalized loss values (below 0.1) across all scenarios by around iteration 250. This consistent convergence pattern appears similar for both Linear Regression and MLP implementations.
The two-stage approach, however, demonstrates markedly different behavior. While the Linear Regression variants show quick initial convergence, MLP implementations maintain relatively high normalized loss values (fluctuating between 0.2 and 0.6) throughout training. The learning curves show considerable oscillation, particularly for the maximum danger scenario, suggesting potential stability issues in the optimization process.
This performance discrepancy suggests that for subsequent analyses, focus should be directed toward three specific configurations: LR + Two-Stage, MLP + SPO+, and LR + SPO+. The MLP + Two-Stage configuration can be reasonably excluded from further investigation due to its demonstrated inferior convergence properties.
Following the first analysis on learning curves, 6 presents the normalized regret distribution during training process for test sets. Our experimental results demonstrate the effectiveness of end-to-end decision-focused learning approaches, particularly when combined with more expressive model architectures. The MLP + SPO+ implementation consistently achieves superior performance across most typical scenarios, exhibiting lower normalized regret compared to both LR + SPO+ and LR + Two-Stage approaches.
To rigorously assess the performance differences between
approaches, we employed the Mann-Whitney U test, a non-parametric
statistical test that evaluates whether two independent samples
come from the same distribution. This test is particularly
appropriate for our analysis as it makes no assumptions about the
normality of the data and is well-suited for comparing the regret
distributions. A lower
Statistical analysis reveals particularly significant
differences in the maximum wind scenario, where MLP + SPO+
significantly outperforms the two-stage approach (
Interestingly, while SPO+ generally shows favorable
performance, the statistical tests reveal no significant
differences between LR + SPO+ and LR + Two-Stage across most
scenarios (all
The variation in performance across different architectures and optimization frameworks provides valuable insights for practical implementations. The notable success of MLP + SPO+ not only demonstrates the advantage of end-to-end SPO learning but also highlights the importance of model expressiveness in capturing complex weather-related patterns. These findings suggest that while SPO+ generally provides stronger performance, the choice of underlying model architecture significantly influences the overall effectiveness of the optimization framework.
The next analysis compares the optimized costs of SPO+ and two-stage approaches based on the trained ML models. We input the features of test sets to predict the costs and use these predictions to optimize each instance via the ASP in the test sets (7). With a cost of when optimizing using true landing times—representing the minimum achievable average cost under the specific scheduling constraints we defined—both MLP+SPO+ and LR+Two-Stage methods significantly outperform the FCFS baseline of . Interestingly, the MLP+SPO+ shows particularly strong performance in scenarios optimized for minimum time interval, achieving a mean cost of compared to Two-Stage’s . This outperformance relative to the "optimal true cost" does not indicate a violation of optimization principles, but rather highlights a key insight: optimization using true landing times isn’t necessarily optimal for the complete operational context. The SPO+ approach can discover solutions that account for broader operational dynamics and uncertainty patterns that aren’t captured when directly optimizing with true landing times. The most significant insight emerges from examining performance across different weather conditions: while the Two-Stage approach maintains relatively uniform costs across all scenarios, the SPO+ method demonstrates sophisticated adaptation to weather conditions, strategically accepting higher transit times under challenging conditions while finding better overall solutions.
This weather-responsive behavior of MLP+SPO+ represents a crucial advancement in arrival scheduling optimization. The systematically higher costs observed under extreme weather scenarios (ranging from to ) indicate that the model effectively incorporates weather-related risks into its decision-making process, making more conservative prediction when conditions are adverse. In contrast, the Two-Stage approach’s more uniform cost distribution suggests a limitation in capturing the complex interplay between weather conditions and optimal routing decisions. These findings indicate that while SPO+ might occasionally suggest higher transit times compared to the optimal true cost, these decisions reflect a trade-off between speed and safety, demonstrating the method’s capability to make more nuanced, context-aware aircraft arrival scheduling decisions.
In addition to algorithmic analysis, we perform a delay assignment analysis to evaluate the fairness consideration in this model. We use the transit time difference for each aircraft and the number of shifting for maximum precipitation scenario. This scenario is selected because it has the largest total cost for MLP+SPO+. comparing between MLP+SPO+ and optimization using true cost.
[Table:delay] reveals that MLP+SPO+ demonstrates improved fairness compared to optimization with true cost, as evidenced by lower mean transit time differences (18.60s vs. 43.62s), reduced standard deviation (181.67s vs. 236.69s), and fewer position shifts per instance (13 vs. 17) in the maximum precipitation scenario. Consider we have 15 aircraft per instance, MLP+SPO+ can achieve average less than 1 position shifting for each aircraft. However, since neither MLP+SPO+ nor the baseline explicitly incorporates fairness parameters, both methods exhibit high variability in transit time differences, reflected in the large standard deviations. This suggests that while MLP+SPO+ achieves better fairness outcomes implicitly through its learning framework, the absence of fairness-aware optimization leads to inconsistent treatment of individual aircraft. The results highlight the potential for further improvements by integrating fairness constraints directly into the model to reduce disparity and stabilize outcomes.
This paper presents an application of the SPO framework to the Aircraft Arrival Scheduling Problem within Terminal Maneuvering Area. We developed an end-to-end learning approach that integrates arrival flight time prediction with scheduling optimization, specifically focusing on London Gatwick Airport operations. Our methodology introduces the concept of traffic instances for simultaneous prediction of multiple aircraft arrival times, coupled with a Mixed Integer Programming model for optimal aircraft arrival scheduling decisions.
The experimental results demonstrate several key findings.
First, the MLP+SPO+ implementation consistently outperforms
traditional two-stage approaches across most scenarios,
particularly with complex weather conditions. The framework shows
sophisticated adaptation to varying weather conditions,
strategically accepting higher transit times under adverse
conditions while maintaining operational efficiency. When the
minimum time interval is required, the MLP+SPO+ will suggest
around
A critical consideration for practical implementation is balancing operational efficiency with ATC manageability and fairness to airlines. FCFS scheduling is conventionally favored for its simplicity and perceived fairness. Our proposed framework demonstrates that optimized sequences can achieve significant cost reductions without inherently compromising these priorities. Compared with benchmark optimization, MLP+SPO+ demonstrates enhanced fairness.
However, our study identifies important limitations and areas for refinement. Methodologically, our focus on isolating the SPO+ loss function’s impact led us to maintain consistency by using unnormalized inputs and Gradient Descent (GD) optimization across the compared methods (e.g., LR+SPO+ vs. LR+2S). While this consistency aids in evaluating the relative benefit of the SPO+ loss, it presents trade-offs. Using unnormalized inputs might not yield the absolute peak performance, particularly for MLP architectures known to benefit from normalization, although our results still confirmed the SPO+ advantage. Similarly, while GD (or other gradient-based methods) is inherent to optimizing the SPO+ loss, applying it to the LR+2S baseline (instead of standard OLS) ensures optimizer consistency for comparison but deviates from typical standalone LR practices. Furthermore, as our experiments suggested, optimal training, particularly concerning input normalization, appears sensitive to hyperparameter calibration, especially for LR models under GD where we encountered convergence challenges with normalization in our initial trials. Beyond these methodological considerations, a significant constraint remains the current SPO framework’s reliance on fixed optimization model structures (beyond objective costs), limiting adaptability to scenarios with varying constraints. Computational efficiency for larger instances and the lack of explicit fairness mechanisms, potentially leading to higher variation in delay assignment, are also key concerns.
Looking ahead, several promising research directions emerge. Extending the SPO framework itself, perhaps incorporating dynamic MIP parameter updates [Hu et al. 2023] and regret computations, is a key avenue. This could involve exploring diverse neural network architectures for traffic instance cost prediction. Crucially, a systematic investigation into the interplay between input normalization techniques, hyperparameter tuning, and model performance (both SPO+ and baselines) is warranted. This includes exploring individually optimized configurations, potentially using OLS for LR+2S baselines when comparing absolute achievable performance rather than isolating loss function effects. Improving computational efficiency, possibly through optimization problem relaxations, remains vital. The framework’s principles could also be extended to related scheduling or routing problems [Graham et al. 1979; Bianco et al. 1993], and transfer learning could enhance applicability across different airports. Lastly, systematically addressing fairness is essential. Future work should explicitly incorporate airline equity metrics (e.g., delay distribution thresholds) as constraints or weighted objectives in the optimization model, better aligning the framework with real-world ATC priorities while preserving its efficiency advantages.
These findings and identified future directions contribute to the growing body of research on ML applications in air traffic management, particularly in the critical area of arrival scheduling optimization. The demonstration of end-to-end SPO learning approaches suggests potential for further development and practical implementation in real-world airport operations.
Go Nam Lui: Conceptualization, methodology, formal analysis, data curation, software, resources, writing – original draft, writing – review & editing, visualization. Soner Demirel: Conceptualization, data curation, writing – original draft, writing – review & editing.
Go Nam Lui receives funding from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant numbers 10086651 (Lancaster University)]. Opinions expressed in this work reflect the authors views only, and the SESAR 3 JU and UKRI are not responsible for any use that may be made of the information contained herein.
All data analyzed during this study are publicly available in https://zenodo.org/records/14014439.
The source code of this research is stored at https://github.com/harrylui1995/ASP_E2EPO.