Hidden Markov Models and Flight Phase Identification

Rémi Perrichon; Xavier Gendre; Thierry Klein;
This web version is auto-generated from the LaTeX source and may not include some elements. Check the PDF version for more details.

Abstract

The use of Hidden Markov Models (HMMs) in segmenting flight phases is a compelling approach with significant implications for aviation and aerospace research. It leverages the temporal sequences of flight data to delineate various phases of an aircraft’s journey, making it a valuable tool for enhancing the analysis of flight performance and safety. In this work, we implement a multivariate HMM to identify 6 flight phases: taxi, takeoff, climb, cruise, approach and rollout. We reach a median global accuracy of about 97% over a sample of several thousand flights with a very low number of decoded unlikely transitions. Regarding several performance metrics, our method is competitive with existing methods in the literature, such as fuzzy logic. Additionally, it provides, for each point of the flight, a probability of belonging to each phase. Even in situations where there are missing values in the data, HMMs remain effective, ensuring that no critical information is lost during the segmentation process. We show that HMMs work seamlessly with the fine granularity of Flight Data Recorder (FDR) data. HMMs offer remarkable flexibility and adaptability, proving particularly effective when the number or order of phases is unknown or not predetermined, as is often the case with complex flight scenarios such as helicopter flights. This adaptability is crucial for handling the diverse range of flight operations that differ from one aircraft to another. An example is given with the segmentation of an Automatic Dependent Surveillance–Broadcast (ADS-B) helicopter flight operated by the Swedish National Police.

Introduction

From a conceptual point of view, there is no trouble in defining flight phases, that is to say different periods within a flight. Common taxonomies are, for instance, provided by the International Civil Aviation Organization (ICAO) [(CICTT) 2013] or by the International Air Transport Association (IATA) in Annex 1 of [Association 2015]. Given some trajectory data, flight phase identification aims at segmenting a flight into different phases. More precisely, a segmentation is a partition of data points.

This task has been popularized with the increasing availability of large Automatic Dependent Surveillance–Broadcast (ADS-B) datasets, for which flight phases are not labeled. It would be tedious to annotate them manually. A famous example of this rising accessibility of ADS-B data is the development of the non-profit OpenSky Network that has grown to 5,000 registered receivers all around the world, providing a large historical database [Sun et al. 2022].

The segmentation of flights has several uses. As stated in [Sun et al. 2017b], flight phase segmentation is utilized to build aircraft performance models. In [Alligier et al. 2015], the mass estimation method for ground-based aircraft climb prediction involves a filtering of climb segments. In [Kuzmenko et al. 2022], flight phase identification is related to delay analysis and safety. As explained by [Zhang et al. 2022], estimating the duration of each flight phase is also believed to enhance the development of reliable noise or emissions models around airports.

To be entirely precise, flight phase identification has several meanings. For the majority of applications, the identification of flight phases is a vertical segmentation problem (say, the identification of the takeoff, climb, cruise, approach and so on). We naturally visualize the different phases by representing them on the altitude profile. However, there are applications for which horizontal flight phases can also be defined. As recently reviewed in [Kovarik et al. 2020], this is the case for conflict detection for which we are also interested in detecting turns. In this contribution, we will focus solely on providing a vertical segmentation. Our primary emphasis is on commercial aviation.

A key aspect of flight trajectories is the undefined number of segments to uncover due to different flight frequencies and operations. Even within the same phase, aircraft may climb at different rates or fly at different cruise altitudes. Another specificity is the strong correlation in time and space between two consecutive points of a trajectory. Additionally, trajectory data may be noisy and/or have missing values.

These characteristics, along with the variety of air operations, account for the wide diversity of approaches presented in the literature on the subject, whether it be on the side of thresholding methods or probabilistic ones. The segmentation methods used in the literature only occasionally take into account the strong temporal correlation that exists between the data points that make up the flight. For example, the widely popular fuzzy logic method developed in [Sun et al. 2017a] would produce an identical segmentation if the observations were permuted in time meaning that each point would have the same label.

Up to our knowledge and despite a well-known plasticity, Hidden Markov Models (HMMs) have not often been used to segment flight phases even though they exhibit very interesting characteristics for this problem. Unlike threshold-based methods or fuzzy logic, HMMs place the temporal aspect of the trajectory at the core of segmentation by modeling the transition probabilities from one flight phase to another. This reduces the number of invalid transitions from one flight phase to another. Using HMMs allows for uncertainty quantification in segmentation, providing the probability of belonging to each class for each point. Unlike supervised methods, HMMs require only a very limited number of inputs and do not need a training phase. HMMs have been used for at least three decades in signal-processing applications, especially in the context of automatic speech recognition, but interest in their theory and application has expanded to other fields (environment, biophysics, ecology etc.) [Zucchini et al. 2016]. As a result, numerous packages are available for their implementation such as [Visser and Speekenbrink 2010].

The contributions of this paper are of various types:

  • The development of a univariate HMM for the detection of the three main flight phases (climb, cruise, and approach), as well as a multivariate model for the detection of the taxi, climb, cruise, approach, and rollout phases.

  • A comparison of segmentation performances with the fuzzy logic approach for the three main flight phases.

  • The calculation of several performance metrics, ranging from global accuracy to the number of invalid transitions on a sample comprising several thousand flights.

  • A discussion on the impact of data preprocessing on the quality of flight segmentation.

  • A discussion about the feasibility of adapting HMMs for the segmentation of a flight for which the phases to be identified are not specified in advance.

The paper is organized as follows. First, we provide a brief overview of existing methods as well as common performance metrics in Section 2. Second, the data we use is outlined in Section 3. Then, we present the theoretical framework of univariate HMMs in Section 4 as well as a model to detect the three main flight phases. The detection of additional flight phases is discussed in Section 5 and falls within the framework of multivariate HMMs. The topics of data preprocessing and adapting models when the phases to be identified are not known in advance are addressed in Section 6.

Brief review of existing approaches

The two main approaches

As put in [Fala et al. 2023], two main approaches are employed to identify phases from flight data records: logical rule-based decision-making, and probabilistic-based decision-making.

Regarding rule-based approaches, several studies have focused on establishing thresholds to segment flight phases [Goblet et al. 2015; Paglione and Oaks 2006]. Given the challenge of specifying universal thresholds for flight phase segmentation, the fuzzy logic approach has established itself in the literature as a flexible, simple, and fast method. Early references on the subject include the work of [Kelly and Painter 2006]. Several publications [Sun et al. 2016; Sun et al. 2017a], and its implementation in OpenAP [Sun et al. 2020] have now made it a widespread method. For each point, it is worth noting that fuzzy logic does not strictly return the probability of belonging to each class. Additionally, it does not consider the temporal nature of the trajectory. Data smoothing is often necessary to achieve good results in practice.

Recently, many contributions have framed the problem of flight phase detection as a machine learning task. The use of decision trees classifiers to segment flight phases has been explored in [Tian et al. 2017]. Some machine learning methods are compared in [Kovarik et al. 2020]. Combined K-means clustering and LSTM neural networks have been combined in [Arts et al. 2021]. Gaussian Mixture Models have been used in [Liu et al. 2020]. To achieve good results, some methods often require a large number of inputs, often unavailable in ADS-B data. For instance, the engine fan speed is used in [Liu et al. 2020]. In any case, many steps seem necessary in the machine learning literature: selection of the parameters, implementation of a decision tree classifier and clustering of the results in [Tian et al. 2017], transformation of trajectory data into fixed length sequential data before using an LSTM neural network in [Arts et al. 2021]. The difficulty of obtaining a reliable training dataset leads some authors to use simulated data [Arts et al. 2021].

HMMs do not suffer from most of the mentioned limitations, as explained in the sequel.

Performance metrics

The comparison of flight phase identification methods is complex on several levels. One initial challenge relates to the number and types of flight phases selected. These can vary greatly depending on whether one considers commercial aviation or general aviation. A second challenge lies in the lack of consensus on the choice of a performance metric. It appears that the latter can be grouped into three main categories:

  • The traditional metrics for classification problems such as the error rate, precision and recall [Goblet et al. 2015; Paglione and Oaks 2006; Tian et al. 2017; Arts et al. 2021; Liu et al. 2020]

  • Metrics that focus on the total duration of each phase [Zhang et al. 2022]

  • Metrics that examine the transitions that are incorrectly predicted between phases as well as the total number of transitions [Sun et al. 2017a]

In all contributions, the results are, of course, initially visualized. Because it is easy to find a degenerate segmentation that would provide an exact value for the duration of each phase while alternating the flight phases very randomly, it seems reasonable to consider that at least two metrics should be used. The use of classification metrics for each flight phase allows for the detection of the model’s inability to segment some flight moments correctly, while global metrics provide an overview of the model’s average performance. Since certain flight phases last significantly longer than others, the overall accuracy metric must be interpreted with caution. Counting the number of improbable transitions as well as the total number of transitions seems to be crucial in measuring the realism of a segmentation. From an operational perspective, the aircraft does not spend its time rapidly transitioning between phases. In the following, we systematically consider multiple performance metrics.

For each flight phase, we typically define the usual F-1 score as the harmonic mean of precision and recall. If we consider the cruise phase, precision would be the amount of correctly predicted cruise points among all the points the model predicted as belonging to the cruise phase. Recall would be the number of cruise points are correctly identified as such among all the cruise points in the reference trajectory. The F-1 score is a metric commonly used in binary classification tasks. It rewards models that can achieve high precision and recall simultaneously. Using the F-1 score avoids to select a method that would label all points of the flight as belonging to a single phase (maximum recall for that phase but very poor precision), or another one that would consist of not labeling many points as belonging to that phase (poor recall but high precision for that phase).

Data

Because ADS-B data do not provide a ground-truth regarding the segmentation of flight phases, several other options are possible. Synthetic data have been used in [Zhang et al. 2022] to validate the model. Data from an aircraft simulator are employed in [Tian et al. 2017] and [Arts et al. 2021]. Flight Data Recorder (FDR) data are encountered in [Liu et al. 2020].

Likewise, we have chosen to use de-identified aggregate flight recorded data made available by NASA. As written on the corresponding DASHlink project page, the files contain actual data recorded onboard a single type of regional jet operating in commercial service over a three-year period. While the files contain detailed aircraft dynamics, system performance, and other engineering parameters, they do not provide any information that can be traced to a particular airline or manufacturer. Appropriate parties have allowed NASA to provide the data to the general public for the purpose of evaluating and advancing data mining capabilities that can be used to promote aviation safety.

In this dataset, flight phases are determined based on the Aircraft Condition Monitoring System (ACMS). It is predictive maintenance tool consisting of a high capacity flight data acquisition unit and the associated sensors that sample, monitor, and record, information and flight parameters from significant aircraft systems and components. There are 8 possible flight phases in the dataset: unknown, preflight, taxi, takeoff, climb, cruise, approach, rollout. Depending on a chosen nomenclature, these flight phases may be renamed. For example, if the approach is defined as the final part of the descent, flight phase labels must be modified. Note that the sampling frequency of each sensor is different, resulting in unequal data lengths of the parameters. As an example, the sampling rate for the longitude is 1 Hz while for pressure altitude, it is 4 Hz.

We focus on data for tail 687 for which there are 5,376 flights. After a few basic data cleaning steps, we are working with 2,868 flights. To be precise, only flights with a duration of more than thirty minutes, for which the main flight phases are documented, are retained for further analysis. Many flights are very short, thus explaining the final size of the sample. Each flight is resampled to 1,000 points (linear interpolation). For a given observation time, it ensures that each sensor value is available (it solves the sampling frequency problem). Note that linear interpolation also acts as a pre-smoothing of the data, simplifying its erratic nature. If the interpolation is too coarse, it is possible that information may be lost, resulting in a less accurate segmentation. Yet, maintaining a high temporal granularity can lead to a non-negligible computational cost and outlier problems. Given our sample, 1,000 points appear to be a good compromise between these two pitfalls. Time is scaled so that each flight starts at \(t=0\) and ends at \(t=1\) (each flight is of different duration). Each flight can be easily visualized, as shown in Figure 1.

Visualization of a randomly selected flight from the dataset. Altitude, longitude, and latitude profiles [left], flat view [center], and three-dimensional view [right].

Univariate Hidden Markov Models

To introduce some definitions and notations, the basic framework of univariate HMMs is presented in Subsection 4.1. A simple model used for flight phase identification is introduced in Subsection 4.2.

The basic univariate framework

Definition

A HMM consists of two parts:

  • We first consider an unobserved parameter process (or hidden state process) denoted \(\left \{ C_t:t=1,2,... \right \}\). It is a sequence of discrete random variables valued in \(\left \{ 1,...,m \right \}\). This process is assumed to be a discrete-time Markov chain. It then satisfies the so-called Markov property: \[\forall \ t \geq 2, \forall \ c_1,...,c_t,c_{t+1} \in \left \{ 1,...,m \right \}, \mathbb{P}(C_{t+1}=c_{t+1} \mid C_t=c_t,...,C_1=c_1)=\mathbb{P}(C_{t+1}=c_{t+1} \mid C_t=c_t).\] That is, conditioning on the history of the process up to time \(t\) is equivalent to conditioning only on the most recent value \(C_t\). The Markov property can be regarded as a first relaxation of the assumption of independence. The random variables are dependent in a specific way that is mathematically convenient.

  • We then consider a state-dependent process denoted \(\left \{ X_t:t=1,2,... \right \}\) (the observation process). It is a sequence of discrete random variables typically valued in \(\mathbb{N}\) or \(\mathbb{R}\). The distribution of this process is assumed to depend only on the current state \(C_t\) and not on previous states or observations. It is a conditional independence assumption, \(\forall \ t \geq 2, \forall \ c_1,...,c_t \in \left \{ 1,...,m \right \}\), \[\label{eq:indepass} \mathbb{P}(X_{t}=x_{t} \mid X_{t-1}=x_{t-1},...,X_{1}=x_{1}, C_t=c_t,...,C_1=c_1)=\mathbb{P}(X_{t}=x_{t} \mid C_t=c_t).\]

As the Markov chain \(\left \{ C_t \right \}\) has \(m\) states, \(\left \{ X_t \right \}\) is called an \(m\)-state HMM. An \(m\)-state HMM has \(m\) state-dependent distributions. Every observation is assumed to have been generated by one of \(m\) component distributions. The hidden state process selects which of the distributions is active at any time. The state-dependent distributions are defined as, for \(i = 1,2,...,m\), \(\forall \ t \geq 1\), \[p_i(x_t)=\mathbb{P}(X_t=x_t \mid C_t=i).\] That is, \(p_i\) is the probability mass or density function of \(X_t\) if the Markov chain is in state \(i\) at time \(t\). We use \(p\) as a general symbol for probability mass or density functions.

To summarize, an HMM is a special type of a dependent mixture model in which a Markov chain selects the component distributions.

Characterization, homogeneity, and stationarity

Due to the Markov property, \(\left \{ C_t \right \}\) is fully characterized by:

  • The initial state distribution \(\boldsymbol{u}(1)=\left ( \mathbb{P}(C_1=1),...,\mathbb{P}(C_1=m) \right )\)

  • The one-step state transition probabilities denoted \(\gamma_{ij}(t)=\mathbb{P}(C_{t+1}=j \mid C_t=i), \forall \ t \geq 1, \forall \ i,j \in \left \{ 1,...,m \right \}\).

Due to the conditional independence assumption, \(\left \{ X_t \right \}\) is fully characterised by the state-dependent distributions.

Unless stated otherwise, we will always assume that the Markov chains we use are homogeneous, that is \(\forall t \geq 1, \gamma_{ij}(t) = \gamma_{ij}\). This hypothesis is classic in the literature on HMMs, and its relaxation is not always necessary to improve the model’s performance. We denote the transition probability matrix as: \[\boldsymbol{\Gamma}=\begin{pmatrix} \gamma_{11} & \cdots & \gamma_{1m}\\ \vdots & \ddots & \vdots \\ \gamma_{m1} & \cdots & \gamma_{mm} \end{pmatrix}.\]

Note that it must be the case that:

  • \(\forall i, \forall j, \gamma_{ij} \in \left [ 0,1 \right ]\),

  • \(\forall i, \ \sum_{j=1}^{m}\gamma_{ij}=1\).

A Markov chain with transition probability matrix \(\boldsymbol{\Gamma}\) is said to have stationary distribution \(\boldsymbol{\delta}\) (a row vector with non-negative elements) if \(\boldsymbol{\delta}\boldsymbol{\Gamma} = \boldsymbol{\delta}\) and \(\sum_{k=1}^{m}\delta_k=1\). Homogeneity alone is not sufficient to render the Markov chain a stationary process. It is sometimes useful to assume that the homogeneous Markov chain starts from its stationary distribution (in this case, it is said to be a stationary Markov chain). In the following, we do not make any stationarity assumption.

Likelihood

Suppose that we have \(T\) consecutive observations \(x_1,...,x_T\), assumed to be generated by an \(m\)-state HMM. To select the HMM parameters for which the model has the highest chance of having generated the observed data, the likelihood should be defined and computed. To be precise, we seek the probability \(L_T\) of observing the sequence, as calculated under an \(m\)-state HMM which has initial distribution \(\boldsymbol{u}(1)\), a transition probability matrix \(\boldsymbol{\Gamma}\), and state-dependent probability functions \(p_i\).

The likelihood is given by: \[L_T=\boldsymbol{u}(1)\boldsymbol{P}(x_1)\boldsymbol{\Gamma}\boldsymbol{P}(x_2)\boldsymbol{\Gamma}\boldsymbol{P}(x_3)\cdots \boldsymbol{\Gamma}\boldsymbol{P}(x_T)\boldsymbol{1}^\top,\] where \(\boldsymbol{P}(x_t)\) is defined as the diagonal matrix with i-th diagonal element \(p_i(x_t)\) and \(\boldsymbol{1}\) is a row vector of ones. If \(\boldsymbol{u}(1)\) is the stationary distribution of the Markov chain, we have \(\boldsymbol{u}(1) = \boldsymbol{\delta}\) and \(\boldsymbol{\delta}\boldsymbol{P}(x_1)=\boldsymbol{\delta}\boldsymbol{\Gamma}\boldsymbol{P}(x_1)\) (which makes the likelihood easier to code in some cases).

The matrix expression of the likelihood is computationally very interesting. To evaluate the likelihood, it is much more efficient to use recursive computation rather than relying on brute-force summation over all possible state sequences (we would have a \(m^{T}\) summands). Indeed, the naive approach separately considerates all possible state sequences that might have given rise to the observations which makes many calculations redundant.

A so-called forward algorithm is used to compute the likelihood. We first define the vector \(\boldsymbol{\alpha}_t\), for \(t=1,2,...,T\), as: \[\boldsymbol{\alpha}_t=\boldsymbol{u}(1)\boldsymbol{P}(x_1)\boldsymbol{\Gamma}\boldsymbol{P}(x_2)\boldsymbol{\Gamma}\boldsymbol{P}(x_3)\cdots \boldsymbol{\Gamma}\boldsymbol{P}(x_T),\] with the convention that an empty product is the identity matrix. It follows immediately from this definition that: \[L_T = \boldsymbol{\alpha}_T \boldsymbol{1}^\top\] and \[\boldsymbol{\alpha}_t = \boldsymbol{\alpha}_{t-1} \boldsymbol{\Gamma}\boldsymbol{P}(x_t), \ t \geq 2.\]

The elements of the vector \(\boldsymbol{\alpha}_t\) are usually referred to as forward probabilities. Computations go: \[\begin{matrix} \boldsymbol{\alpha}_1=\boldsymbol{u}(1)\boldsymbol{P}(x_1);\\ \boldsymbol{\alpha}_t=\boldsymbol{\alpha}_{t-1}\boldsymbol{\Gamma}\boldsymbol{P}(x_t), \ t=2,...,T;\\ L_T=\boldsymbol{\alpha}_T \boldsymbol{1}^\top. \end{matrix}\] In practice, the maximization of the likelihood with respect to the parameters can be made numerically. It leads to well-known problems:

  • Numerical underflow (in the discrete case, the likelihood approaches 0 with probability \(1\) exponentially fast). In this case, it is enough to scale the likelihood computation.

  • Unbounded likelihood (when we consider continuous state-dependent distributions, it may happen that the likelihood is unbounded in the vicinity of certain parameter combinations). To tackle this issue, the discrete likelihood is maximized instead of the joint density.

  • Constraints on the parameters (for example, row sums of \(\Gamma\) are equal to \(1\)). A common practice is to reparametrize the model.

  • Multiple local maxima in the likelihood. A sensible strategy is to use a range of starting values for the maximization, and to see whether the same maximum is identified in each case.

Instead of performing a direct numerical optimization, another popular approach is to treat the states as missing and to employ the EM algorithm in order to find the maximum likelihood estimates of the parameters. Yet, direct optimization is advisable when some constraints are added to the model.

Local decoding

Given the HMM and the observations, one can deduce information about the states occupied by the underlying Markov chain. Such inference is known as decoding.

Local decoding of the state at time \(t\) refers to the determination of that state which is most likely at that time. To perform local decoding, analogous to forward probabilities, we should define backward probabilities as the elements of the vector \(\boldsymbol{\beta}_t\): \[\boldsymbol{\beta}_t^\top = \boldsymbol{\Gamma}\boldsymbol{P}(x_{t+1})\boldsymbol{\Gamma}\boldsymbol{P}(x_{t+2})\cdots \boldsymbol{\Gamma}\boldsymbol{P}(x_T)\boldsymbol{1}^\top,\] with the convention that an empty product is the identity matrix.

For each time \(t\in\left \{ 1,...,T \right \}\), one can therefore determine the distribution of the state \(C_t\), given the observations \(x_1,...,x_T\), which for \(m\) states is a discrete probability distribution with support \(\left \{ 1,...,m \right \}\). The conditional distribution of \(C_t\) given the observations can be obtained, for \(i=1,2,...,m\), as \[\begin{aligned} \label{eq:proba} \mathbb{P}(C_t=i \mid \boldsymbol{X}^{(T)} = \boldsymbol{x}^{(T)}) &= \frac{\mathbb{P}(C_t=i,\boldsymbol{X}^{(T)} = \boldsymbol{x}^{(T)})}{\mathbb{P}(\boldsymbol{X}^{(T)} = \boldsymbol{x}^{(T)})} \\ &= \frac{\alpha_t(i)\beta_t(i)}{L_T}. \end{aligned}\] For each time \(t \in \left \{ 1,...,T \right \}\), the most probable state given the observations, is defined as: \[\label{eq:local_decode} i^*_t=\underset{i=1,...,m}{\mathrm{argmax}} \ \mathbb{P}(C_t=i\mid \boldsymbol{X}^{(T)}=\boldsymbol{x}^{(T)}),\] where \(\boldsymbol{X}^{(T)}\) is the the history \(\left ( X_1,...,X_T \right )\) and \(\boldsymbol{x}^{(T)}\) is \(\left ( x_1,...,x_T \right )\). This approach determines the most likely state separately for each \(t\) by maximizing the conditional probability \(\mathbb{P}(C_t=i\mid \boldsymbol{X}^{(T)}=\boldsymbol{x}^{(T)})\).

Local decoding comes with one crucial advantage: an uncertainty quantification in the decoded state sequence. It is illustrated in Subsection 6.2.

Global decoding

Instead of the most likely state for each separate time \(t\), it is often the case that we are interested in the most likely sequence of hidden states. Instead of maximizing \(\mathbb{P}(C_t=i\mid \boldsymbol{X}^{(T)}=\boldsymbol{x}^{(T)})\) over \(i\) for each \(t\) (Equation [eq:local_decode]), one seeks that sequence of states \(c_1, c_2,..., c_T\) which maximizes the conditional probability \(\mathbb{P}(\boldsymbol{C}^{(T)}=\boldsymbol{c}^{(T)} \mid \boldsymbol{X}^{(T)}=\boldsymbol{x}^{(T)})\). This represents a subtly distinct maximization problem compared to local decoding and is referred to as global decoding. The outcomes of local and global decoding are frequently quite similar, although not identical. There is a highly efficient algorithm for determining this sequence of states, known as the Viterbi algorithm [Viterbi 1967]. Details may be found in [Zucchini et al. 2016] (Subsection 5.4.2) and in [Visser and Speekenbrink 2022] (Subsection 4.5.2).

A univariate model for flight phase identification

Suppose an aircraft is observed at integer times \(t = 1,2,...,T\). For the moment, we assume that there are no missing values (this assumption is relaxed in the following). For each time index, we observe \(q\) values: it could be the position of the aircraft, its speed, the vertical rate and so on.

We naturally consider \(q\) time series \(\left \{ \boldsymbol{X}_t=\left ( X_{t1}, X_{t2}, X_{t3}, ... ,X_{tq}\right ) :t=1,...,T\right \}\). Each series may represent observations of different type. For example, longitude and latitude are circular-valued whereas the altitude is positive or zero. In this section, we focus on the case \(q=1\). It means that only one time series is used for the segmentation.

For the identification of the three main flight phases (climb, cruise, approach), we begin by presenting an initial model based on the rate of climb (RoC) expressed in \(\text{ft}.\text{min}^{-1}\) in Subsection 4.2.1. We illustrate the case in which there are missing values in Subsection 4.2.2.

Identification of the three main phases with the rate of climb (RoC)

For the first model, we consider the rate of climb (RoC) to identify three flight phases: the climb, the cruise, and the approach. Flight phases may be seen as states. Transitions between the states are very constrained: we should go from the climb to the cruise and from the cruise to the approach. It is very unlikely to jump from the climb directly to the approach. It is impossible to go from the approach to the climb. We naturally specify a constrained 3-state univariate HMM for which the transition graph of the corresponding Markov chain is represented in Figure 2. The transition probability matrix of the first model is: \[\boldsymbol{\Gamma_{1}}=\begin{pmatrix} \gamma_{11} & \gamma_{12} & 0\\ \gamma_{21} & \gamma_{22} & \gamma_{23} \\ 0 & \gamma_{32} & \gamma_{33} \end{pmatrix}.\]

Transition graph of the constrained 3-state Markov chain.

The first state is a good candidate to represent the climb phase. To ensure this, the initial distribution is taken to be \(\boldsymbol{u}(1) = (1, 0, 0)\) (it is fixed). State 2 refers to the cruise and state 3 to the approach. We use 20 different starting values to increase the chances of finding the global maximum. The state-dependent density we consider for the RoC is the Gaussian one. Initial values are chosen as follows:

  • As the climb is known to last for some time, we draw \(\gamma_{11}\) from the uniform distribution \(\mathcal{U}_{\left [0.8, 0.95 \right ]}\) and we set \(\gamma_{12} = 1 - \gamma_{11}\). Likewise, we draw \(\gamma_{21}\) from \(\mathcal{U}_{\left [0.01, 0.04 \right ]}\), \(\gamma_{22}\) from \(\mathcal{U}_{\left [0.9, 0.95 \right ]}\) (the cruise lasts some time), we fix \(\gamma_{23} = 1 - \gamma_{22} + \gamma_{23}\) (after the cruise comes the approach), \(\gamma_{32}\) from \(\mathcal{U}_{\left [0.01, 0.04 \right ]}\) and \(\gamma_{33} = 1 - \gamma_{32}\).

  • The means of the normal distributions are drawn randomly as well as the standard deviations. Because there are 3 states, there is one mean and one standard deviation per state.

The choice of plausible starting values will avoid numerical instabilities.

Per se, HMM are unsupervised methods. The model does not return a segmentation involving the original data labels (climb, cruise, approach). Indeed, the states of the HMM are fully data-driven and do not have a predefined interpretation. Yet, the a priori meaning of the states has been integrated into the constraints such that there is no ambiguity in assigning the original labels. Results are very satisfactory. We compare them with a naive segmentation based on the following rules:

  • If the altitude rate is positive (with some tolerance \(\varepsilon\)), the phase is said to be the climb.

  • If the altitude rate is zero (with some tolerance \(\varepsilon\)), the phase is said to be the cruise.

  • If the altitude rate is negative (with some tolerance \(\varepsilon\)), the phase is said to be the approach.

The tolerance parameter \(\varepsilon\) is chosen through trial and error. We also consider a fuzzy logic segmentation with values provided in [Sun et al. 2017a]. A visual result for a typical flight is provided in Figure 3.

Identification results for a typical flight on the altitude profile.

With the naked eye, the obtained segmentations appear very satisfactory. On this particular flight, there is no striking difference.

We examine four performance metrics to assess the quality of the results. First, we use the global accuracy per flight (proportion of points that are correctly labeled). Second, we compute the F-1 score for each phase separately. We also consider the number of unlikely transitions per flight and the number of transitions per flight. In this section, we consider two unlikely transitions: going (directly) from climb to approach and from approach to climb. Note that in short flights, it is feasible to transition directly from climb to descent (approach) without a cruise phase. Yet, this situation is very uncommon in our sample as the minimum duration is at least thirty minutes. The empirical distributions of these performance metrics over a subsample of 2,823 flights are presented as several box plots in Figure 4. Among the 2,868 flights, the subsample corresponds to flights that have at least the 3 flight phases of interest. Details on how to calculate the performance metrics using the fuzzy logic of [Sun et al. 2017a] are provided in 8 (flight phases are not exactly the same).

Box plots of the global accuracy [left box plot] and F-1 scores per state. The crosses correspond to the averages.

Regarding the global accuracy, it appears that our model is very good. The lower performance of the fuzzy logic is surely explained by the absence of any data pre-processing. Discussion on data pre-processing is postponed to Subsection 6.1. Allowing a tolerance \(\varepsilon\) for the naive method explains its good results. The dependency of fuzzy logic on the erratic nature of FDR data logically results in a large number of transitions. While the median number of transitions is 6 for the entire set of reference flights, it is 21 for fuzzy logic, 14 for the naive method, and 6 for the HMM. Taking into account the temporal dependence of the points helps avoid too frequent alternation between the phases. The inflation in the number of transitions translates into some unlikely transitions. The median number of unlikely transitions across the entire sample is 0, the same as for the naive logic and the HMM. However, with fuzzy logic, if the data is not pre-processed, at least 50% of the flights have 2 unlikely transitions. Unlikely transitions are inherently quite uncommon with HMMs because small transition probabilities make certain sequences very rare. Crucially, it must be highlighted that there is a non-zero proportion of invalid transitions in the reference data. About 8% of the reference flights in the subsample have at least an invalid phase transition. With our method, 91% of the flights have no invalid transitions (74% for the naive method, 21% for the fuzzy logic if no pre-processing is done).

Missing values

In the case of HMM, a simple adjustment needs to be made to the likelihood computation if data are missing. Suppose that the value of the RoC is missing at integer time \(t=k\). Let \(L_T^{-k}\) be the likelihood of the observations. We have: \[\label{eq:likemissing} L_T^{-k}=\boldsymbol{u}(1)\boldsymbol{P}(x_1)\boldsymbol{\Gamma}\boldsymbol{P}(x_2)\boldsymbol{\Gamma}\boldsymbol{P}(x_3)\cdots \boldsymbol{\Gamma}\boldsymbol{P}(x_{k-1})\boldsymbol{\Gamma}^2\boldsymbol{P}(x_{k+1})\cdots \boldsymbol{\Gamma}\boldsymbol{P}(x_{T})\boldsymbol{1}^\top.\] The diagonal matrix \(\boldsymbol{P}(x_k)\) has been replaced by the identity matrix.

If one assumes that missingness is ignorable, the ignorable likelihood (Equation [eq:likemissing]) is a reasonable basis for estimating parameters. To be more precise, this likelihood may be used if one assumes that data are missing at random as defined by [Rubin 1976]. An important case in which this assumption does not hold is the state-dependent missingness case. Note that it is necessary to include time points with missing observations to allow the state probabilities to be computed properly (simply ignoring the missing time points is not valid). Missing points may be consecutive or not. Figure 5 shows that the quality of the final segmentation is minimally affected by missing values.

Identification results for a given flight. The initial segmentation [left] can be compared to our one [right]. 500 points are missing in this example (drawn uniformly at random).

Multivariate Hidden Markov Models

In order to detect more flight phases, multiple variables are required, and it is necessary to introduce multivariate HMM. The basic framework is presented in Subsection 5.1. The multivariate model for flight phase identification is detailed in Subsection 5.2.

The multivariate framework

In this section, \(s \in \left \{2,...,q \right \}\) series are used for the segmentation (multivariate case). We assume that, conditional on \(\boldsymbol{C}^{(T)}=\left \{ C_T,...,C_1 \right \}\), the random vectors \(\boldsymbol{X}_1,...,\boldsymbol{X}_T\) are mutually independent. This assumption is called longitudinal conditional independence and is the multivariate counterpart of the univariate conditional independence assumption (Equation [eq:indepass]).

Regarding the state-dependent distributions, there are two main things to specify:

  • The distribution of the random vector \(\boldsymbol{X}_t=\left ( X_{t1}, X_{