Uncertainty Assessment of Fuel Consumption Based on Open Data

Antoine Chevrot; Luis Basora;

Original paper

DOI for the original paper: https://doi.org/10.59490/joas.2023.7233

Review - round 1

Reviewer 1

In this paper, authors perform a sensitivity-based analysis to show the impact of the uncertainty in certain variables (such as average weight per person, cruise altitude, and load factor, among others) on the uncertainty of total fuel consumption. The motivation and methodology are sound, and properly alligned with the PRC research challenge on environment. Furthermore, the consideration of uncertainty provides a reasonable degree of novelty to the piece of research.

Notwithstanding the above, there are several comments the authors must address to increase the scientific quality of the manuscript:

1) There are several errors and inconsistencies all along the definitions of the First-Order Sobol Index \(S_i\) and the Total Sobol Index \(S_{T_i}\), in page 6, lines 194 to 209, indicated as follows:

1.1) In line 199, authors state that \(X_i\) stands for the subset of X that includes all input factors except the i-th factor. This is clearly not correct and inconsistent with the definitions of \(X_i\) and \(X_{~i}\) in lines 205 and 206 (which are correct). Furthermore, authors are suggested to bring these correct definitions to an earlier line (for instance, to line 199).

1.2) In line 200, it reads "variance of the expected value of f(X) conditional on all input factors except \(X_i\)", but it should read "variance of the expected value of f(X) conditional on the input factor \(X_i\)". Note that some other authors prefer to explicitate the fact that the expected value is obtained among all input factors except \(X_i\), that is, this is an expected value performed over \(X_{~i}\). It can be done by using E with a subscript \(X_{~i}\). Analogously, authors could have explicitated that the variance is performed over \(X_i\).

1.3) I am not quite sure that Eq. 2 is correct, but I would dare to say this is not. It is not helping, on the one hand, that the equation is not stated as customary (see https://en.wikipedia.org/wiki/Variance-based_sensitivity_analysis) and, on the other hand, that do not celarly explicitate the variables with respect to which variances and expected values are computed. I would recommend the authors to carefully check the expression and correct it if necessary.

2) As for the chosen variables for the experiment (page 7), authors provide mean values for the first three of them, but not standard deviations. This way, the reader does not have a notion of the variance of, for instance, AWP. One would have expected the variance of AWP to be small, since it is the sum of around 160 (e.g., as in A320) equally distributed independent normal variables. Hence, one would have not expected the AWP to be one of the most affecting factors to the fuel consumption uncertainty.

3) Sobol factors provided in Figure 2 (and, analogously, in Figures 4 and 5) raise some questions about the validity of the analysis, as indicated below:

3.1) First, they are provided in what seems to be a single value along with a confidence interval. Nevertheless, nothing is said about how authors constructed the confidence interval (a reference would have been needed).

3.2) There are plenty of cases where the Sobol’ indices computed are negative and, according to their deifinition, this should be impossible. Authors must check results for correctness and, in case they are correct, must include an explanation as to how this could happen, and how the reader must understand such a result.

4) In page 9, lines 277 and 278, the sentence is not complete. It reads "It can be observed that in the sensitivity analysis of the longer-haul, the cruising altitude is", and nothing else is said.

All in all, due to the fact that there is an important concern on the validity of results, my main recommendation is to suggest the authors to perform a thorough revision of their manuscript and resubmit it for another revision, prior to make a final decision on its publication.

Reviewer 2

In this paper, the authors present a preliminary analysis focusing on the effect that a series of variables have on fuel consumption. The contribution is valuable and the topic is very relevant, as fuel consumption computation in aviation is still a hot research topic with a great room for improvement.

Although I think the work presented by the authors deserves being published, I would like first to ask the authors to address some doubts that have arisen after reading the article:

a) I would like to ask the authors to review the manuscript specifically for English usage, grammar, and overall linguistic coherence. I highlight some of the issues encountered in the list below. Note the list is not exhaustive and that the authors should check the entire text. Sometimes the paper was very hard to follow because of these issues. I strongly advise the authors to revise the whole paper before resubmitting. I am just highlighting elements that are easy to correct, but in some parts of the text there are entire paragraphs which are confusing (the text becomes confusing and harder to follow from section 4.2, before that it is well written).

  • Line 107: these variables (missing the "s")

  • Line 220-222: non-gramatically correct sentence. It should be something like "For this experiment, the default type of engine set by OpenAP was chosen; however, the type of engine could have also been..."

  • Line 248: known in advance (the "d" should be removed)

  • Line 249: instead of writing the "length of the great circle" I would say it is better to simply write great-circle distance, it sounds more natural (actually, the authors use this term it in Line 252)

  • Line 250: collision avoidance manoeuvres, etc. (missing the comma as the authors are presenting a series of elements)

  • Line 252: great circle distance for each scneario (without the "’s", right? or am I missing something?)

  • Line 256: the environmental impact of aviation ("the" before aviation not needed).

  • Line 256-257: Indeed, one of the main ways to prevent condensation trails, one of the main driving contributors to global warming during cruising, is the change of flight level to avoid high humidity zones (comma placement!)

  • Line 270-271: I would rephrase this sentence to "In future and more exhaustive experiments, other variables could be added, such as the age of the aircraft or the additional freight." The current sentence is a bit hard to read.

  • Line 287: please, rephrase, as the sentence has some grammatical issues, specially when mentioning the variation. A better version would be the following: "For instance, during the climb phase of the A320, the CAS varied by only around 10 knots".

  • Line 296: there is an isolated "s" in the middle of the sentence, please check.

  • Line 297: this is the first occurrence of RPK, please write the acronym meaning (actually, the first occurrence is in Figure 3, I would say it is a good idea to write the acronym meaning in both places, although this might depend on the journal policies)

  • Line 312-313: please, rephrase "as high of an impact as the creation of condensation trails" (missing the "s")

  • Line 317: "As shown in Figure 1" (no need to write "the" before "Figure")

  • Line 320 "to avoid a higher calculation time" or "to avoid higher calculation times" sounds more natural.

  • Line 323: "but it is not" ("it" missing)

  • Line 324-326: I recommend the authors to rephrase this sentence, it is not gramatically correct and hard to understand. The main issue is with the word "however". When used as a conjunctive adverb to introduce contrast, it is either used at the beginning of the sentence followed by a comma, or in the middle of the sentence preceeded by a semicolon and followed by a comma. In general, the only situation in which "however" is written in the middle of the sentence is when its meaning equals "by whatever means possible" (e.g., Do that however you want). Furthermore, the authors are using "but" and then "however", this is redundant. Anyway, a better version of this sentence could be "The exploration of other methods to conduct the sensitivity analysis could be introduced in future works to reduce the computation time, thus exploring a higher number of scenarios. However, the interdependencies of the input will still remain a challenge."

  • Line 330: "to find a dataset" or "to find datasets"

  • Line 332: remove the comma after "trajectories"

  • Line 337: "open data" (you just need to add the hyphen in this case if you use these two words as a compound adjective, e.g., open-data structures)

  • Line 342: "Results have shown that" (not "has", it is plural)

  • Line 343: "The two most impactful input variables" sounds more natural (or "the two input variables that impacted the most the fuel consumption...")

  • Line 351: One of the challenges (missing the "s")

b) Instead of mentioning climbing, cruising and descending phases, I recommend the authors to just write climb, cruise and descent phases. It sounds more natural and it is usually the preferred way to refer to the different flight phases. There are different occurrences throughout the text. Actually, in Line 306, I see the authors wrote "cruise altitude", but then in other parts of the text it is written "cruising altitude". It would be better if the authors were more consistent with the terminology, and I recommend to use the term "cruise" instead of "cruising" in this case.

c) In Line 147, the authors mention the fact that they do not have access to data over the Atlantic Ocean. What about satellite based ADS-B, did the authors consider this data or is its coverage still very limited? Or do only other services like FlightRadar24 have access to such information? Whatever the answer is, I recommend the authors to add some short discussion in the paper regardig this topic.

d) In line 278, at the end of the sentence, the cruise altitude value is missing. Plus, in that same paragraph, the authors refer to the "longer-haul" case.

I recommend the authors to, first, modify Figure 2 and add a caption for each subfigure (e.g., Figure 2a: A320 - 580 km, Figure 2b: A320 - 2300 km, etc).

In addition, the authors could actually remove the term "Sobol indices" from each subfigure caption, as it is already mentioned in the Figure 2 caption (related to that, I ask the authors to please be consistent in the paper and either refer to "Sobol indices" or "Sobol’s indices". I am not aware which is the better term, though).

After these modifications are done, everytime the authors refer to one of the cases (like the "longer-haul" case in Line 278), the corresponding Figure could be referenced in between parenthesis – e.g., longer-haul (Figure 2d).

I think all these modifications would make the text easier to follow.

e) I have some doubts regarding the data used to conduct the experiments. As far as I understood, the authors used the OpenskyNetwork trajectories to obtain several parameters, such as the flight type, cruise altitude, speeds, etc. Then, they used these parameters to generate the trajectories with OpenAP, right? And actually, some of these parameters are the variables described in Section 4.1, which change for each run in the MonteCarlo simulation. The variables explanations are clear to me except for the cruise altitude.

What values are the authors using in this case? Are the authors using some kind of distribution like in some of the other variables or is the approach different? I did not find this information in the text, so I would appreciate if the authors could include it.

f) The explanation of the Sobol’s Indices in Section 3.2 is clear. However, I would like to see more details (both qualitative and quantitive) in the discussion presented in Section 4.2 regarding the obtained indices for each of the variables. The current analysis is OK but rather short, so I think more details could be explained from all the plots presented in Figure 2.

g) The authors claim that using directly OpenSky network data to conduct this research might be difficult, but it seems the only reason they mention is the fact that no data can be obtained over oceanic regions (in this case, the authors highlight traffic data over the Atlantic). However, if this data was available (related to comment C), would it be possible to conduct the research with OpenSky data? Would that lead to more accurate results than with simulated trajectories generated with OpenAP?

I understand that the authors could apply the fuel consumption model of either OpenAP or BADA to the trajectory data obtained from the OpenSky network. However, the Monte Carlo simulation might be harder to do, right? It might be hard to find trajectories in which just a parameter changes and the others remain constant, and probably not all the required data is available from just OpenSky...

Could the authors add a more in-depth discussion about this topic in the paper?

Reviewer 3

The paper is clear and structured.

I have mainly to report few typos and some phrases to clarify or reword.

  • line 33: "open database" –> "open databases"

  • line 70: "...The Base of Aircraft ..." –> "...the Base of Aircraft..."

  • line 71: "...originally developed by Eurocontrol." –> "...developed and maintained by Eurocontrol."

  • line 79: clarify (or drop) "through incentives"

  • line 107: "...within these variable can" –> "...within these variables can"

  • lines 121 - 125: it is not clear what ’They’ in "They consider that" refers to. Are they the authors of ’Casado and al.’ or all the previously cited works [13, 14, 15]?. Similarly for ’it’ in "As a result, it simplifies" and later "It showed that".

  • line 133: "the main contributing factors" –> suggestion –> "the main influencing factors"

  • line 134: "and advancing aviation sustainability" –> suggestion –> "and supporting aviation sustainability assessment"

  • line 139: I my (non native) English I would remove the "upon"...it just doesn’t sound right

  • lines 152-153 to be rephrased IMHO, sothing like: "Therefore in order to have complete trajectories even in areas not covered by OSN, we decided to create synthetic flight trajectories using openAP,..."

  • line 155: "This makes of" –> "This make"

  • sentence in lines 155-156: seems redundant if 10. is applied. So 152-156 could benefit from rephrasing.

  • lines 251-252: "we used a Normal distribution centered on the great circle distance..." Why "centered"? In fact the great circle distance is the minimum possible, so you need a skewed distribution, don’t you? Or you state its use for ease of calculations/theory/...

  • line 261 (missing plural): "Most of the performance model default to idle descent..." –> "Most of the performance models default to idle descent..."

  • lines 264-265: rephrase to something like "For sake of completeness, we also considered five additional variable to our sensitivity analysis. These are the CAS and ..."

  • lines 270-271: rephrase to something like "A future refined study could consider some the addition of other variables such as the age of the aircraft, freight..."

  • line 275 rephrase: "The main contributor to the variation of fuel consumption for these scenarios is the cruise altitude"

  • lines 278-279: it looks like there is missing text

  • lines 280-281: be more precise, say what it is, on "On certain scenarios, ..."

  • line 287: "varying of more or less 10 knots" –> varying around plus or minus 10 knots"

  • line 296: "aircraft s a secondary result" –> "aircraft as a secondary result" ?

  • line 297: I haven’t found "RPK" defined before/anywhere in the text

  • line 308: "rarely change" –> "rarely changes"

  • line 309: "As for the sensitivity analysis, the Sobol’s indices. Figure 2 shows..." –> "As for the sensitivity analysis, the Sobol’s indices in Figure 2 shows..."

  • line 323: "...[22] but is not" –> "...[22] but it is not"

  • line 332: clarify/explain "the addition of uncontrolled variables"? what are these? what is meant with "uncontrollable"?

  • line 334: "some data are rarely known like the engine unsed" –> "some required parameters are rarely known such as the engine model"

  • line 335: "...on the default paramentes set in OpenAP as they are often the most used" –> "...on the OpenAP defaults"

  • line 342: "Results has " –> "Results have"

  • lines 348-349: firstly/secondly –> first/second

  • line 393: incomplete reference information "In: (2016)"

  • line 431: "was one of the fuel consumption of of OpenAP" –> "was due to the fuel consumption or to OpenAP"

  • github repo: great to have the code, well done. An additional notebook could have made the explanation of how to use/run the analysis more appealing.

Response - round 1

Response to reviewer 1

1) There are several errors and inconsistencies all along the definitions of the First-Order Sobol Index \(S_i\) and the Total Sobol Index \(S_{T_i}\), in page 6, lines 194 to 209, indicated as follows:

1.1) In line 199, authors state that \(X_i\) stands for the subset of X that includes all input factors except the i-th factor. This is clearly not correct and inconsistent with the definitions of \(X_i\) and \(X_{\sim i}\) in lines 205 and 206 (which are correct). Furthermore, authors are suggested to bring these correct definitions to an earlier line (for instance, to line 199).

1.2) In line 200, it reads "variance of the expected value of f(X) conditional on all input factors except \(X_i\)", but it should read "variance of the expected value of \(f(X)\) conditional on the input factor \(X_i\)". Note that some other authors prefer to explicitate the fact that the expected value is obtained among all input factors except \(X_i\), that is, this is an expected value performed over \(X_{\sim i}\). It can be done by using E with a subscript \(X_{\sim i}\). Analogously, authors could have explicitated that the variance is performed over \(X_i\).

1.3) I am not quite sure that Eq. 2 is correct, but I would dare to say this is not. It is not helping, on the one hand, that the equation is not stated as customary (https://en.wikipedia.org/wiki/Variance-based_sensitivity_analysis) and, on the other hand, that do not clearly explicitate the variables with respect to which variances and expected values are computed. I would recommend the authors to carefully check the expression and correct it if necessary.

We completely revamped the section. The previous iteration was indeed full of inaccuracies and lacked clarity. We also added a short subsection on the Saltelli estimator that was used for the experiments of the paper. We feel that this important part of the paper is now in a better state.

2) As for the chosen variables for the experiment (page 7), authors provide mean values for the first three of them, but not standard deviations. This way, the reader does not have a notion of the variance of, for instance, AWP. One would have expected the variance of AWP to be small, since it is the sum of around 160 (e.g., as in A320) equally distributed independent normal variables. Hence, one would have not expected the AWP to be one of the most affecting factors to the fuel consumption uncertainty.

We added the different standard deviation used for the experiments and also mentionned the fact that we mostly used truncated distribution to avoid impossible values. We also added a sentence in the AWP definition that each passenger is not drawn from the distribution, but rather the average of the weight for a given flight. This average is then multiply by e.g. 160. This was done to differentiate holiday destination with more regular flights. We also made sure that the Maximum Take-off Weight was never reached in our experiments. This was done similarly in other works.

3) Sobol factors provided in Figure 2 (and, analogously, in Figures 4 and 5) raise some questions about the validity of the analysis, as indicated below:

3.1) First, they are provided in what seems to be a single value along with a confidence interval. Nevertheless, nothing is said about how authors constructed the confidence interval (a reference would have been needed).

3.2) There are plenty of cases where the Sobol’ indices computed are negative and, according to their deifinition, this should be impossible. Authors must check results for correctness and, in case they are correct, must include an explanation as to how this could happen, and how the reader must understand such a result.

We added a sentence to precise what the confidance interval was on the figure. With the rerunning of the experiments with a higher number of samples, we got rid of the negative values except for the confidence intervals. Thus, we added a sentence in the discussion section to explain why this could happen with the estimator used.

4) In page 9, lines 277 and 278, the sentence is not complete. It reads "It can be observed that in the sensitivity analysis of the longer-haul, the cruising altitude is", and nothing else is said.

Done

Response to reviewer 2

a) I would like to ask the authors to review the manuscript specifically for English usage, grammar, and overall linguistic coherence. I highlight some of the issues encountered in the list below. Note the list is not exhaustive and that the authors should check the entire text. Sometimes the paper was very hard to follow because of these issues. I strongly advise the authors to revise the whole paper before resubmitting. I am just highlighting elements that are easy to correct, but in some parts of the text there are entire paragraphs which are confusing (the text becomes confusing and harder to follow from section 4.2, before that it is well written).

We corrected all the elements highlighted by the reviewer here.

b) Instead of mentioning climbing, cruising and descending phases, I recommend the authors to just write climb, cruise and descent phases. It sounds more natural and it is usually the preferred way to refer to the different flight phases. There are different occurrences throughout the text. Actually, in Line 306, I see the authors wrote "cruise altitude", but then in other parts of the text it is written "cruising altitude". It would be better if the authors were more consistent with the terminology, and I recommend to use the term "cruise" instead of "cruising" in this case.

We corrected all instances to be more consistent

c) In Line 147, the authors mention the fact that they do not have access to data over the Atlantic Ocean. What about satellite based ADS-B, did the authors consider this data or is its coverage still very limited? Or do only other services like FlightRadar24 have access to such information? Whatever the answer is, I recommend the authors to add some short discussion in the paper regardig this topic.

We added some sentences regarding this problem. We were indeed limiting these experiments to open-data, and it is not possible to find a reliable open-sourced dataset of satellite of ADS-B data as of today.

d) In line 278, at the end of the sentence, the cruise altitude value is missing. Plus, in that same paragraph, the authors refer to the "longer-haul" case.

Corrected

I recommend the authors to, first, modify Figure 2 and add a caption for each subfigure (e.g., Figure 2a: A320 - 580 km, Figure 2b: A320 - 2300 km, etc).

In addition, the authors could actually remove the term "Sobol indices" from each subfigure caption, as it is already mentioned in the Figure 2 caption (related to that, I ask the authors to please be consistent in the paper and either refer to "Sobol indices" or "Sobol’s indices". I am not aware which is the better term, though).

After these modifications are done, everytime the authors refer to one of the cases (like the "longer-haul" case in Line 278), the corresponding Figure could be referenced in between parenthesis – e.g., longer-haul (Figure 2d).

We modified according to what the reviewer proposed to improve the readability.

e) I have some doubts regarding the data used to conduct the experiments. As far as I understood, the authors used the OpenskyNetwork trajectories to obtain several parameters, such as the flight type, cruise altitude, speeds, etc. Then, they used these parameters to generate the trajectories with OpenAP, right? And actually, some of these parameters are the variables described in Section 4.1, which change for each run in the Monte Carlo simulation. The variables explanations are clear to me except for the cruise altitude.

What values are the authors using in this case? Are the authors using some kind of distribution like in some of the other variables or is the approach different? I did not find this information in the text, so I would appreciate if the authors could include it.

We added some explanation concerning the cruise altitude

f) The explanation of the Sobol’s Indices in Section 3.2 is clear. However, I would like to see more details (both qualitative and quantitive) in the discussion presented in Section 4.2 regarding the obtained indices for each of the variables. The current analysis is OK but rather short, so I think more details could be explained from all the plots presented in Figure 2.

We added further explanations in the section so that it does not feel as short.

g) The authors claim that using directly OpenSky network data to conduct this research might be difficult, but it seems the only reason they mention is the fact that no data can be obtained over oceanic regions (in this case, the authors highlight traffic data over the Atlantic). However, if this data was available (related to comment C), would it be possible to conduct the research with OpenSky data? Would that lead to more accurate results than with simulated trajectories generated with OpenAP?

I understand that the authors could apply the fuel consumption model of either OpenAP or BADA to the trajectory data obtained from the OpenSky network. However, the Monte Carlo simulation might be harder to do, right? It might be hard to find trajectories in which just a parameter changes and the others remain constant, and probably not all the required data is available from just OpenSky...

I think the reviewer correctly understood the challenges of using real-data directly to do a variance-based sensitivity analysis. We added sentences in the discussion section to ease the comprehension of these different challenges for any future reader.

Response to reviewer 3

I have mainly to report few typos and some phrases to clarify or reword.

We modified the text according to what was proposed by the reviewer. Here are some detailed answers for some questions.

13. lines 251-252: "we used a Normal distribution centered on the great circle distance..." Why "centered"? In fact the great circle distance is the minimum possible, so you need a skewed distribution, don’t you? Or you state its use for ease of calculations/theory/...

Indeed, using a truncated distribution with a minimum set to the great circle distance is only logical. Whith the needed re-run of the experiment on a higher number of samples, we changed the distribution to address the reviewer’s remark. We also added information about this in the text.

33. github repo: great to have the code, well done. An additional notebook could have made the explanation of how to use/run the analysis more appealing.

We added a notebook to the GitHub repository to display the results of the runs. We did not feel like it was good to add the actual execution loop in a notebook as it is already computationally demanding.