The use of Hidden Markov Models (HMMs) in segmenting flight phases is a compelling approach with significant implications for aviation and aerospace research. It leverages the temporal sequences of flight data to delineate various phases of an aircraft’s journey, making it a valuable tool for enhancing the analysis of flight performance and safety. In this work, we implement a multivariate HMM to identify 6 flight phases: taxi, takeoff, climb, cruise, approach and rollout. We reach a median global accuracy of about 97% over a sample of several thousand flights with a very low number of decoded unlikely transitions. Regarding several performance metrics, our method is competitive with existing methods in the literature, such as fuzzy logic. Additionally, it provides, for each point of the flight, a probability of belonging to each phase. Even in situations where there are missing values in the data, HMMs remain effective, ensuring that no critical information is lost during the segmentation process. We show that HMMs work seamlessly with the fine granularity of Flight Data Recorder (FDR) data. HMMs offer remarkable flexibility and adaptability, proving particularly effective when the number or order of phases is unknown or not predetermined, as is often the case with complex flight scenarios such as helicopter flights. This adaptability is crucial for handling the diverse range of flight operations that differ from one aircraft to another. An example is given with the segmentation of an Automatic Dependent Surveillance–Broadcast (ADS-B) helicopter flight operated by the Swedish National Police.
From a conceptual point of view, there is no trouble in defining flight phases, that is to say different periods within a flight. Common taxonomies are, for instance, provided by the International Civil Aviation Organization (ICAO) [(CICTT) 2013] or by the International Air Transport Association (IATA) in Annex 1 of [Association 2015]. Given some trajectory data, flight phase identification aims at segmenting a flight into different phases. More precisely, a segmentation is a partition of data points.
This task has been popularized with the increasing availability of large Automatic Dependent Surveillance–Broadcast (ADS-B) datasets, for which flight phases are not labeled. It would be tedious to annotate them manually. A famous example of this rising accessibility of ADS-B data is the development of the non-profit OpenSky Network that has grown to 5,000 registered receivers all around the world, providing a large historical database [Sun et al. 2022].
The segmentation of flights has several uses. As stated in [Sun et al. 2017b], flight phase segmentation is utilized to build aircraft performance models. In [Alligier et al. 2015], the mass estimation method for ground-based aircraft climb prediction involves a filtering of climb segments. In [Kuzmenko et al. 2022], flight phase identification is related to delay analysis and safety. As explained by [Zhang et al. 2022], estimating the duration of each flight phase is also believed to enhance the development of reliable noise or emissions models around airports.
To be entirely precise, flight phase identification has several meanings. For the majority of applications, the identification of flight phases is a vertical segmentation problem (say, the identification of the takeoff, climb, cruise, approach and so on). We naturally visualize the different phases by representing them on the altitude profile. However, there are applications for which horizontal flight phases can also be defined. As recently reviewed in [Kovarik et al. 2020], this is the case for conflict detection for which we are also interested in detecting turns. In this contribution, we will focus solely on providing a vertical segmentation. Our primary emphasis is on commercial aviation.
A key aspect of flight trajectories is the undefined number of segments to uncover due to different flight frequencies and operations. Even within the same phase, aircraft may climb at different rates or fly at different cruise altitudes. Another specificity is the strong correlation in time and space between two consecutive points of a trajectory. Additionally, trajectory data may be noisy and/or have missing values.
These characteristics, along with the variety of air operations, account for the wide diversity of approaches presented in the literature on the subject, whether it be on the side of thresholding methods or probabilistic ones. The segmentation methods used in the literature only occasionally take into account the strong temporal correlation that exists between the data points that make up the flight. For example, the widely popular fuzzy logic method developed in [Sun et al. 2017a] would produce an identical segmentation if the observations were permuted in time meaning that each point would have the same label.
Up to our knowledge and despite a well-known plasticity, Hidden Markov Models (HMMs) have not often been used to segment flight phases even though they exhibit very interesting characteristics for this problem. Unlike threshold-based methods or fuzzy logic, HMMs place the temporal aspect of the trajectory at the core of segmentation by modeling the transition probabilities from one flight phase to another. This reduces the number of invalid transitions from one flight phase to another. Using HMMs allows for uncertainty quantification in segmentation, providing the probability of belonging to each class for each point. Unlike supervised methods, HMMs require only a very limited number of inputs and do not need a training phase. HMMs have been used for at least three decades in signal-processing applications, especially in the context of automatic speech recognition, but interest in their theory and application has expanded to other fields (environment, biophysics, ecology etc.) [Zucchini et al. 2016]. As a result, numerous packages are available for their implementation such as [Visser and Speekenbrink 2010].
The contributions of this paper are of various types:
The development of a univariate HMM for the detection of the three main flight phases (climb, cruise, and approach), as well as a multivariate model for the detection of the taxi, climb, cruise, approach, and rollout phases.
A comparison of segmentation performances with the fuzzy logic approach for the three main flight phases.
The calculation of several performance metrics, ranging from global accuracy to the number of invalid transitions on a sample comprising several thousand flights.
A discussion on the impact of data preprocessing on the quality of flight segmentation.
A discussion about the feasibility of adapting HMMs for the segmentation of a flight for which the phases to be identified are not specified in advance.
The paper is organized as follows. First, we provide a brief overview of existing methods as well as common performance metrics in Section 2. Second, the data we use is outlined in Section 3. Then, we present the theoretical framework of univariate HMMs in Section 4 as well as a model to detect the three main flight phases. The detection of additional flight phases is discussed in Section 5 and falls within the framework of multivariate HMMs. The topics of data preprocessing and adapting models when the phases to be identified are not known in advance are addressed in Section 6.
As put in [Fala et al. 2023], two main approaches are employed to identify phases from flight data records: logical rule-based decision-making, and probabilistic-based decision-making.
Regarding rule-based approaches, several studies have focused on establishing thresholds to segment flight phases [Goblet et al. 2015; Paglione and Oaks 2006]. Given the challenge of specifying universal thresholds for flight phase segmentation, the fuzzy logic approach has established itself in the literature as a flexible, simple, and fast method. Early references on the subject include the work of [Kelly and Painter 2006]. Several publications [Sun et al. 2016; Sun et al. 2017a], and its implementation in OpenAP [Sun et al. 2020] have now made it a widespread method. For each point, it is worth noting that fuzzy logic does not strictly return the probability of belonging to each class. Additionally, it does not consider the temporal nature of the trajectory. Data smoothing is often necessary to achieve good results in practice.
Recently, many contributions have framed the problem of flight phase detection as a machine learning task. The use of decision trees classifiers to segment flight phases has been explored in [Tian et al. 2017]. Some machine learning methods are compared in [Kovarik et al. 2020]. Combined K-means clustering and LSTM neural networks have been combined in [Arts et al. 2021]. Gaussian Mixture Models have been used in [Liu et al. 2020]. To achieve good results, some methods often require a large number of inputs, often unavailable in ADS-B data. For instance, the engine fan speed is used in [Liu et al. 2020]. In any case, many steps seem necessary in the machine learning literature: selection of the parameters, implementation of a decision tree classifier and clustering of the results in [Tian et al. 2017], transformation of trajectory data into fixed length sequential data before using an LSTM neural network in [Arts et al. 2021]. The difficulty of obtaining a reliable training dataset leads some authors to use simulated data [Arts et al. 2021].
HMMs do not suffer from most of the mentioned limitations, as explained in the sequel.
The comparison of flight phase identification methods is complex on several levels. One initial challenge relates to the number and types of flight phases selected. These can vary greatly depending on whether one considers commercial aviation or general aviation. A second challenge lies in the lack of consensus on the choice of a performance metric. It appears that the latter can be grouped into three main categories:
The traditional metrics for classification problems such as the error rate, precision and recall [Goblet et al. 2015; Paglione and Oaks 2006; Tian et al. 2017; Arts et al. 2021; Liu et al. 2020]
Metrics that focus on the total duration of each phase [Zhang et al. 2022]
Metrics that examine the transitions that are incorrectly predicted between phases as well as the total number of transitions [Sun et al. 2017a]
In all contributions, the results are, of course, initially visualized. Because it is easy to find a degenerate segmentation that would provide an exact value for the duration of each phase while alternating the flight phases very randomly, it seems reasonable to consider that at least two metrics should be used. The use of classification metrics for each flight phase allows for the detection of the model’s inability to segment some flight moments correctly, while global metrics provide an overview of the model’s average performance. Since certain flight phases last significantly longer than others, the overall accuracy metric must be interpreted with caution. Counting the number of improbable transitions as well as the total number of transitions seems to be crucial in measuring the realism of a segmentation. From an operational perspective, the aircraft does not spend its time rapidly transitioning between phases. In the following, we systematically consider multiple performance metrics.
For each flight phase, we typically define the usual F-1 score as the harmonic mean of precision and recall. If we consider the cruise phase, precision would be the amount of correctly predicted cruise points among all the points the model predicted as belonging to the cruise phase. Recall would be the number of cruise points are correctly identified as such among all the cruise points in the reference trajectory. The F-1 score is a metric commonly used in binary classification tasks. It rewards models that can achieve high precision and recall simultaneously. Using the F-1 score avoids to select a method that would label all points of the flight as belonging to a single phase (maximum recall for that phase but very poor precision), or another one that would consist of not labeling many points as belonging to that phase (poor recall but high precision for that phase).
Because ADS-B data do not provide a ground-truth regarding the segmentation of flight phases, several other options are possible. Synthetic data have been used in [Zhang et al. 2022] to validate the model. Data from an aircraft simulator are employed in [Tian et al. 2017] and [Arts et al. 2021]. Flight Data Recorder (FDR) data are encountered in [Liu et al. 2020].
Likewise, we have chosen to use de-identified aggregate flight recorded data made available by NASA. As written on the corresponding DASHlink project page, the files contain actual data recorded onboard a single type of regional jet operating in commercial service over a three-year period. While the files contain detailed aircraft dynamics, system performance, and other engineering parameters, they do not provide any information that can be traced to a particular airline or manufacturer. Appropriate parties have allowed NASA to provide the data to the general public for the purpose of evaluating and advancing data mining capabilities that can be used to promote aviation safety.
In this dataset, flight phases are determined based on the Aircraft Condition Monitoring System (ACMS). It is predictive maintenance tool consisting of a high capacity flight data acquisition unit and the associated sensors that sample, monitor, and record, information and flight parameters from significant aircraft systems and components. There are 8 possible flight phases in the dataset: unknown, preflight, taxi, takeoff, climb, cruise, approach, rollout. Depending on a chosen nomenclature, these flight phases may be renamed. For example, if the approach is defined as the final part of the descent, flight phase labels must be modified. Note that the sampling frequency of each sensor is different, resulting in unequal data lengths of the parameters. As an example, the sampling rate for the longitude is 1 Hz while for pressure altitude, it is 4 Hz.
We focus on data for tail 687 for which there are 5,376 flights. After a few basic data cleaning steps, we are working with 2,868 flights. To be precise, only flights with a duration of more than thirty minutes, for which the main flight phases are documented, are retained for further analysis. Many flights are very short, thus explaining the final size of the sample. Each flight is resampled to 1,000 points (linear interpolation). For a given observation time, it ensures that each sensor value is available (it solves the sampling frequency problem). Note that linear interpolation also acts as a pre-smoothing of the data, simplifying its erratic nature. If the interpolation is too coarse, it is possible that information may be lost, resulting in a less accurate segmentation. Yet, maintaining a high temporal granularity can lead to a non-negligible computational cost and outlier problems. Given our sample, 1,000 points appear to be a good compromise between these two pitfalls. Time is scaled so that each flight starts at \(t=0\) and ends at \(t=1\) (each flight is of different duration). Each flight can be easily visualized, as shown in Figure 1.