How I used big data and machine learning to find good deals on cruises.

Cruise prices change every day. If you book at the right time you can save 40% or more. Having done 20+ cruises myself it became obvious that technology could help me save big $$$s on cruises if employed correctly. That’s why I hacked cruisewatch together – a website that uses big data and machine learning to identify price drops and good deals on cruises.

Here are some things that I found out:

  • There are huge differences in cruise pricing over time. You can save an average of 40% when booking at the right time.
  • The “stars” of a ship don’t necessarily reflect upon the quality. There are great ships with loyal fans that are not 5 star+
  • You can find great deals when comparing price against quality.

We have been receiving some queries regarding the technologies used on this site. Here’s an overview of what we are using.

Frontend

Our frontend is built using Bootstrap, using a commercial bootstrap theme. We are also using WordPress for some articles (such as this one) and bbPress for forum functionality inside wordpress.

Data Aquisition

We use our own processes to aggregate data from across the web. We are also making use of Grepsr for some specific data items that we are obtaining. Grepsr is a simple data scraping service that turns any website into a feed. They can perform regular scans and will deliver only updated or new data items to us.

We are also using Flickr and their api to display cruise pictures.

Data Storage

Our historical data takes up quite some space. We have a layered storage scheme. Some of the data items reside in cloud storage (both with AWS as well as with Google). Data rendered on the pages in stored on the webserver (mysql really). Some of our processes need in memory storage. We are experimenting with memsql for this.

Statistics and Analysis

We are cleaning the incoming data and performing some different data transformation techniques on it. We are also using Holt Winters algorithms to detect pricing anomalies (all of this code has been written by us ourselves).

For sentiment and text analysis we had a larger look at APIs out there. We are currently using the Cogito API.

Forecasting

It’s not live yet but we have a working prototype using machine learning via the Google Prediction API service

Alerting

Our newsletter use Mailchimp, our personalized emails use Mandrill.

 

So what’s working so far?

Data aquisition and transformation is working fine. Even for a long time now. (We have data reaching back 5 years). We are really proud about our first Net Promoter Score analysis, comparing professional and customer reviews and deriving interesting insights, such as Ships with best Recommendation Ratio or our derived Quality Score Rating. We also made an initial connection between quality and price and can derive the “value” a ship is offering from this. I will post an article about this in the next days.

Machine learning of cruise price movements is working fine with the Google Prediction API. In fact, I also tried to run it in AWS which was even simpler to utilize. But it seems that the cost in AWS is much higher than with Google. So watch out for our prediction visualization that is coming this month.

The APIs and webservices out there are plentiful and relatively easy to use. This saves time to focus on the real important question: Which data do we correlate and how can we visualize things so that users can understand this. For this we need feedback. Please use our forum to talk back to us.