Spotify, Big Data, And The Future Of Music Streaming

OOnline music streaming services like Spotify have never been more popular, but as the market becomes saturated with competitors like Tidal and Apple Music, how do you stay on top? The answer it seems is “Big Data”.

What is Big Data?

Big Data is a term used for data sets so large that they exceed the capabilities of traditional data analysis. In many cases, companies will have to build their own systems to store, analyse and present this large amount of information in a manner that makes sense.

Big Data

Big Data

There are many advantages to collecting and analysing Big Data, such as allowing companies to accurately monitor customer engagements and trends, allowing for thorough testing which can be used to vastly improve user experience. On top of that, business decisions can be supported with statistical evidence, making operations more objective.

The Big Data business model 

Earlier this year, Spotify announced that it has over 75 million global users. This amount of users provides Spotify with a wealth of extensive user information that has the power to drastically change the service they provide.

The key to user engagement is a great user experience. If Spotify is intuitive and personalised, feeling in tune with a user’s taste and listening profile, those users are more likely to engage with the artists suggested to them.  This is where Big Data comes into the picture: to create a service that feels personal for millions of people, you have to analyse each and every one of those people.

Rochelle King, senior designer at Spotify, explains the relationship between user experience and data:



Spotify collects a wide range of user data, but the most important is the listening data it collects from each user. Each listen is collected in a user log, which is then used to target new music at the listeners based on their past interactions. Spotify can also use the data it collects to analyse how its users react to certain changes: if they add a new feature and nobody uses it, they can get rid of it based on statistical evidence. All of this leads to a great user experience.

Most importantly, however, user listening data allows Spotify to feature particular artists in the recommended section of users who are statistically likely to be interested in them. As a result, an artist’s songs are streamed more often by users who will make repeated listens.

The results of this statistic targeting also contribute to the perfect user experience: if a listener likes Taylor Swift, they won’t end up with a dashboard full of Metallica – unless it’s statistically proven they like both.

The complexity of data collection

On paper, this collection of data seems simple, but it’s actually a complicated process that relies on many different systems as well as a huge amount of physical storage space. Each day, Spotify users create over 600GB of listening data, while Spotify creates more than 4TB of data storing new music and other assets. In total, Spotify has over 28 petabytes of storage spread over four global data centres.

Big Data really is the right term for such monumental amounts of information, and building a picture with it such large quantities of data requires a massive amount of analysis and many complex algorithms. To remedy this, Spotify acquired The Echo Nest – a “music intelligence company”– for a whopping $100million, highlighting how important and necessary Spotify views the data analysis aspect of its business.

How does Spotify analyse data?

To answer this question, there’s no better person than Spotify’s Jason Palmer, who explains how Spotify analyse the data collected from its users:

“Most of our recurring data is added to our analytics pipeline by a set of daemons that constantly parse the syslog on production machines looking for messages we have defined along with the associated data for each message. Matching data is compressed and periodically synced to HDFS.  Typically data is available in our Data Warehouse and Dashboards within 24 hours, but in some cases data is available within a few hours or even instantly through tools like Storm.

So all this sounds… complicated. And I assure you, to build a pipeline and infrastructure like we have, it is. But to make use of it is actually really easy.  Engineers can easily add data to our analytics pipeline by adding a new message to our log parser and simply logging information to syslog using the correct format.”

Moving forward

At an estimated cost of £1.8 billion per year, analytics in the music industry has become a big money game. However, many companies have not adopted a Big Data approach with only 10%-15% of UK businesses are taking advantage of a big data strategy. According to Campbell Williams, Group Strategy and Marketing Director, from Six Degrees Group

“We have smart devices that are constantly connected…most businesses with any level of maturity will have some form of systems of record: CRM, ERP, IT service management packages, etc. It is important to have proper tools to help you to manage that data, build a data warehouse, look at data mining tools and business intelligence tools etc.”

As for Spotify, its focus is to become increasingly dependent on Big Data, as Jason Palmer explains:

“Spotify strives to be entirely data driven. We are a company full of ambitious, highly intelligent, and highly opinionated people and yet as often as possible decisions are made using data. Decisions that cannot be made by data alone are meticulously tracked and fed back into the system so future decisions can be based off of it.

How fantastic is that?  Sounds robotic, but humans cannot be trusted so it’s cool.

So the conclusion is to rely on data whenever possible.  Don’t have enough data?  Get more.  Make data the most important asset you have because it is the only reliable decision maker that can scale your company.”

Guest post by Matthew Langham from Search Laboratory