What happens when a route that has never been flown before is launched? Common questions that an airline might ask itself include, can it be flown daily? How should the Revenue Management system be set up initially? What should marketing budgets be optimised for? 

Typically, when an airline is planning for these eventualities a lot of "gut feeling" comes into the mix. And when things don't work, it’s easy to put it down to competition, macroeconomics, or various other factors. Airlines tend to plan new routes based on countries and often use destinations in that same country to predict the behaviour of a new route. This method falls short, however, when you consider two cities in the same country grouping such as Moscow and St Petersburg, or Lisbon and Faro, which, though close geographically, cannot be treated similarly when it comes to route planning.

Yet, what happens if the industry could apply a more scientific approach to predicting how new O&D routes are likely to behave? This would surely help Marketeers to guide when and where they should concentrate their efforts – especially for new routes, assist Revenue Managers in developing the initial pricing, and support Schedulers in decisions around time and day to schedule flights.

Machine Learning can provide a more scientific solution to address these questions, and we'll demonstrate how it can be applied to solve complex scenarios just like the above, where multiple variables come into decision-making. In many industries, Artificial Intelligence (AI) is being rolled out to help improve efficiency in various business areas, yet its use is not commonplace in airline strategies. In this piece, we demonstrate how Artificial Intelligence - in this case a fairly simple formula called the K-means algorithm*, can be applied to flight search data to help airlines gain a significant advantage in how they evaluate routes in order to make better informed decisions.

 In this study, we use Machine Learning to compare thousands of Origins & Destinations. Our goal is to group them by using unsupervised Machine Learning.

For the study we used datasets** available in Travel Insight – Skyscanner’s data platform, analyzing the search patterns of over 50,000 origins and destinations during last year (2016).  

The Search Data Analysis

In the study, each destination is described with 30 different parameters.  Parameters are elements such as the month of travel, when travel is booked, how long travellers stay at the destination and the type of group travelling. In this case, we have decided on which parameters we wanted to use and the machine then uses these descriptors to create the groups.

For the search data analysis we did not pre-select any geographical parameter for the algorithm, nor did we bias the grouping or clustering process by applying any attributes to any of the routes like distance or country of origin, or supply the machine within any prior indication of services such as price or availability of flights. The results are purely based on unconstrained travellers’ searches.

The aim is to find similarities in terms of passenger demand over the world.



 The full results of the analysis can be seen in our dashboard below. Here, we run through some high-level observations from the data:

Unsurprisingly, long summer family holidays represent the largest grouping

  • The data suggests that more origins and destinations than any other come under this group. Typically these routes are for travel during summer months, with weekend departures and at least one child is part of the booking. 

Distance over country

  • The study shows that while geography matters, proximity to origin country dictates behaviour more so than a preference for the destination country. For example Faro (FAO) behaves very similarly to Algiers (ALG) and is therefore grouped in the same cluster; both are typically winter destinations with mid-week departures. They are relatively similar in distance yet would perhaps not be the first two places to be grouped together in a traditional assumption.

Traditionally ‘romantic’ couples' city break destinations are equally popular with single travellers

The data reveals that either a high or low share of ‘solo trips’ and ‘share of couples'’ categories are always allocated to the same clusters. This effectively means there is no "typical" couple's destination. Venice for example behaves similarly to other destinations with a high share of couples, but also has a similarly high share of travellers visiting this destination as part of a solo trip.


How to read the results

  • On the dashboard select an origin city to see the grouping of destinations

  • The colours highlight the groupings, which are also numbered from 0-9

  • The bar charts at the bottom of the dashboard show the characteristic of each group compared to other groups (the value of each attribute is meaningless in isolation, so they are used for comparison between groupings)

  • The bubble chart on the left indicates the relative size of each grouping


How can you use the results?

  • As a Marketer:

    • It could be a key component of a "You may also like" algorithm to suggest destinations to undecided users.

    • Planning marketing spend by destination, especially for new routes.


  • As a Pricing / Revenue Manager:

    • You could develop the initial pricing strategy for a new route.

    • Reorganize the route’s allocations within the RM department.


  • As a Network Planner:

    • You could use this type of data to decide on what day of the week, or time of the day to schedule flights.

    • You could use it for calibration of the network planning tool, as suggested above.

 The airline industry has always used quantitative analysis in order to predict the consequences and outcome of launching new routes. Machine Learning, however, is a new, powerful, and effective new way of analysing complex data. It is able to deliver answers that will allow the industry to avoid many of the ‘instinct’ assumptions typically used to decide on new routes, which carriers have relied upon for many decades.

Because Skyscanner's data suite, Travel Insight, is able to capture interest and intent data, we are able to help airlines conduct and comprehend these advanced analyses.

Want to learn more about how this type of data analysis could assist your aviation business?

Get in touch

*Our approach to the study is based on a Python-based K-means algorithm, which is an unsupervised machine learning algorithm. If you want to understand the in’s and out’s of the K means algorithm you can watch this video here.

**The data set is based on global searches taking place on Skyscanner throughout 2016, including over 50,000 Origins and Destinations.