This is a guest blog by Data Scientist, Ruth Garcia. With her experience in data analysis, Ruth aims to understand how online display advertising affects the usability and quality of user experience at Skyscanner. She focuses on building machine learning models for online advertising; tackling problems that involve ranking and bidding.
Interested in finding out more about how we do data science in advertising at Skyscanner? Contact Ruth via LinkedIn.
Increasing advertising inventory and revenue is very much on Skyscanner’s radar. Yet at the same time, we must always remember to keep user experience at heart of what we do.
To better tackle the needs of our partners and users, we have built our own ads management platform: Skyscanner Ads Manager (SAM). This is designed to optimize for user experience and serve the best ads to the user (i.e. those with the most accurate pricing, better UX, video, etc.), and provide data that will lead us to understand better how users behave around ads.
As with any ads manager, it is essential that the SAM shows the most relevant ads to users based on the context of their search. This generally translates to more clicks on ads - and more traffic to our partners’ sites. In part, our revenue depends on this conversion. Therefore, estimating the probability that ad impressions will lead to clicks is critical to our marketplace.
To manage this estimation, we decided to build a click-prediction model within SAM. This will help us to decide which ads to show based on our own data. Using our own data is particularly beneficial since we collect rich user data that is not shared with other ad delivery systems.
Taking the example of our inline ad, which is the native cost-per-click (CPC) ad that appears at the top of the flight search results, the main role of the click-prediction model would be to rank all candidate ads for specific inline placement by their likelihood of being clicked. An example of this is illustrated below.
How do we predict clicks?
As with every new product, SAM and its click-prediction model will go through several iterations. The first iteration uses ten features to train the model, which include:
User-based features such features such as the browser, device, and operating system used when the ad was displayed.
Query-based features such as the route, market and currency used within a search.
Ad features such as unique identifiers for the advertiser and ad creative.
To train the model, we are using a simple logistic regression with the well-known Machine Learning (ML) library in Python: scikit-learn. This allows us to predict the categorical outcome (i.e., click or no click) based on the values of the ten features used in the model. Once the model is trained, the results generated are saved in a file and accessed by the SAM delivery algorithm. Loading the model from a file was possible due to the type of algorithm used for training.
Although this first iteration provides a great starting point for machine-learning based advertising optimization, it presents some limitations. Most importantly:
- Not all the features that could influence a click (e.g. ad colours, text shown etc.) are tracked and included in the click-prediction model.
- A REST API was not integrated due to the impact it could have on the time taken to display each ad. This limited the Machine Learning tools which could be used.
How was the model built?
The first iteration was built with the following requirements in mind:
- Incremental learning: To train the current model, we have been using scikit-learn and opted for an SGDClassifier since it can learn incrementally. This means ‘memory errors’ can be avoided, as the whole data set (including information on advertisers, campaigns and categorical features) is not taken-in all at once.
- Categorical features: Many machine learning tools will only accept numbers as their input. To represent the categorical features (i.e. browser type, routes, or advertiser id) used in SAM as numbers, we investigated two key transformation methods: “One hot encoding” and “feature hashing”. After testing the two methods, we found there were no major differences in performance as we currently do not deliver a large quantity of ads via SAM. This means that dimensionality, which is better supported by “Feature hashing” is not yet an issue. However, as we look to future-proof our model, “Feature hashing” will be the best way to ensure we don’t crash the model.
Finally, to simplify the way we load the model, we have trained it using logistic regression in the first iteration. Largely this was selected as it has been the method of choice in many advertising industry use cases. In the future, we will explore if more complex algorithms and combinations will lead to more accurate predictions.
How is the model evaluated?
To evaluate the success of the model we are comparing the performance of it against randomly selecting which ads to display within a placement. Offline, we began evaluating the model using a simplified model with two known metrics: Area Under the Curve (AUC) and Precision at 1 (P1). The AUC equals the probability that a randomly chosen positive example (e.g. a click) ranks above a randomly chosen negative example (e.g. no click). The P1 shows how precise the model is by ranking the best performing ads by CTR. As the SAM does not manage much data at this point, we have used the data of ads delivered by external systems by mapping our internal features to those deliveries. Several different time windows for training and sampling methods are used to ensure the reliability of the evaluation.
The results of the evaluation are as follows:
- The AUC is better than random – meaning the model performs better than randomly selecting which ads to use in the placement. Having said this, with an AUC of 0.5 (and a perfect classifier scoring 1) we still have some work to do.
- Similarly, the precision at 1 shows that the click model performs better than randomly ranking ads. It shows a potential 32.5% uplift in comparison to randomly ranking.
For the deployment of the model, we are replicating this process using SAM, choosing the model with best AUC. We save the model and make it accessible within SAM, and repeat this process every day to update the model and monitor the results. When SAM starts delivering more ads, the model will be ready to learn from the new data every day.
The general overview of the model is represented in the below figure. The important part of the figure is to show how the model (red section) would be influencing ad delivery and how the model updates daily as new user feedback is available (clicks and not clicks).
Conclusions and next steps
Even a simple machine learning model with limited conditions has the potential to improve the performance of the delivery of ads. What is more important though is the fact that now we are starting to have control on our ad delivery algorithms. We can start to have the freedom to tune our algorithms according to users and partners demands. There are still a couple of steps, however, to follow before switching on the click-prediction model in the delivery of ads in the SAM:
- Delivering ads based only on those with the highest probability of being clicked block us from exploring other possibilities that may have been underestimated in the training process. For this reason, and to meet our partners campaigns, we need to combine other criteria in addition to the click-prediction model which will allow us to choose when to exploit the model and when to explore other alternatives.
- Carrying out experiments to evaluate the value of the click-prediction model using real time online data is a priority. This experiment is important since approaches that are effective in offline evaluations are not necessarily effective in real-world scenarios.
Note: An overview of the lessons learned were presented in the Artificial Intelligence Conference in Romania on June 7th, 2018 and the slides can be found here.
Interested in learning more about advertising with Skyscanner?