I’ve spent a good amount of time recently exploring my musical history via different analytical tools and visualizations. For those who haven’t seen my previous posts, here’s a short recap:
- First, I discovered Paul Lamere’s “Sort Your Music” tool that allows you to get Echo Nest data back for any playlist in your Spotify library and used it to analyze 11 years worth of seasonal playlists that I’ve made.
- I then highlighted a number of useful Last.FM based visualization and data tools, which allowed you to dig into your “scrobbles” with ease so that anyone with a Last.FM account could visualize their listening history.
- Just a few days ago, I published a post which focused on the ability to export your full Last.FM scrobbling data to a CSV file, which I then threw into Tableau to explore my date based listening trends for nearly 14,000 scrobbles in 2017.
All three of these individual exercises were extremely valuable to me, and fed my curiosity for knowing more about how I listen to music. But, we can take it one step further still by combining the data set from the Sort Your Music Tool with the CSV of Last.FM scrobbles.
Data Wrangling + The Tedious, Really Not Fun Stuff
First, let’s run down the data that each of these tools (Sort Your Music and the Export Function of the Mainstream Factor Tool) spits out.
Sort Your Music:
Input: A Spotify Playlist of any length
Output: Order of Track (#), Title, Artist, Release Date, BPM, Energy, Dance, Loudness, Valence, Song Length, Acoustic, Popularity
Last.FM Export Tool (via Mainstream Factor):
Input: Your Last.FM Username
Output: UTC DateTime of Playtime, Artist, Album, Title
This is a lot of great data to dig into – the only problem is that if I wanted to dig into my full listening history, I would have to find a way to get the CSV of nearly 14,000 Scrobbles into the Sort Your Music tool somehow. Unfortunately, Sort Your Music only supports a Spotify playlist as an input, so the first step in this data journey was to make the CSV that came out of the Export Tool into a Spotify Playlist.
There are a number of outdated, janky-looking online tools and scripts out there that say they do exactly that. Typically, one would use this tool if you had playlists from alternate music platforms that you wanted to transfer over, or, if you’re really old school, you have a music library/playlist that you exported as a CSV or XML somewhere on your computer that you wanted to then transfer into an online music platform. Either way, I was glad I found this online playlist converter which took my Scrobbles CSV and made it a Spotify playlist. It was the only one which ended up working for me at all, but it did have some drawbacks. When I tried putting in the entire file at one time, it would hang and then freeze, so I was forced to split my CSV into 12 different upload files, with my scrobbles separated by month. When everything was done, some tracks were read as a karaoke version or a remix, and a few weren’t read at all: I got 97.5% of the scrobbles from the CSV into 12 Spotify playlists (13,462/13,805), which was good enough for me since I wasn’t going to go looking for those 343 scrobbles. I then threw those playlists into the Sort Your Music tool, and copied the results from that into another Excel file.
The biggest hurdle with this project was the fact that Sort Your Music doesn’t give you anything associated with the date you listened to a specific song, and mapping the two data sets together based on date proved difficult. In lieu of a date for the Spotify based data, I kept the order of the scrobbles in tact in the Spotify playlists (for example, #1 would be the first scrobble of the year, #10,000 would be the 10,000th, etc), so I could use that # field as a proxy for the date, as shown in many of the visualizations below.
I broke the data that came out of the Sort Your Music tool into monthly chunks that I tried lining up with the Scrobbles data as best as I could. As you can see, all of the Echo Nest data is extremely consistent over the course of an entire year’s worth of listening. This goes against what I was expecting to see: some sort of noticeable uptick in the BPM or Energy or Valence (Mood) at some point during the year, but it’s clear that I didn’t stray from my typical listening habits enough to have any sort of overarching effect. Perhaps it’s because I listen to mostly rock/alternative/indie music, or perhaps there’s flat out too much noise in the data so that everything evens out. I also included the distinct count of the number of artists I listened to during that period, and also found that was relatively consistent with the number of plays in a certain month.
According to some research, 120 beats per minute (BPM) turns out to be the average tempo of many popular hit songs, especially for rock and pop songs. So it’s no surprise that the average BPM of my 2017 listening history hovered around 120 BPM.
The diagram above is a box and whisker plot, where each blue dot represents a play. You can see the heavier amount of blue dots along and around the median (the line that goes through the center of the box) falling at about 122 BPM. Here’s another way to look at that same data.
For this many data points, I found that using bar graphs was the easiest to show the data trends. This bar graph where the X-axis is BPM and the Y-axis represents the number of plays of songs with that BPM clearly shows that the vast majority of the songs I listen to fall within the range of about 110 to 130, and the distribution itself is fairly symmetric.
Keeping consistent with the fact that I listen to a lot of mainstream rock and pop music, the majority of the music I listen to has a length of 3-4 minutes. In terms of overall percentage, nearly 71% of my plays fell between 2m30s and 4m30s, with 43% of the total falling in the 3-4 minute range. Only 11% of the plays had lengths above five minutes.
I grew up on a steady diet of my parent’s record collection, which was centered around music from the 60s and 70s. I still listen to those oldies/classic rock bands quite often, so it’s a little bit of a surprise to me that only about 19% of my plays fall within those two decades. In comparison, 60% of the plays fall in the years 2000-17 (with about 10% of the plays in 2017 being music that was released that year).
There is some skewing of the data here towards newer releases: for example, if I listen to a remastered or “deluxe” edition of an older album (i.e. the Smiths, The Queen is Dead) – even though the album itself came out in 1986, Spotify classifies the reissue as being released in 2017. The same thing happens when I listen to compilation albums that typically come out later than the original recordings like Greatest Hits or Essential. Without going through all the scrobbles data track by track, I would estimate that anywhere between 5-10% of the music in the range of 2000-17 is probably reissues, deluxe editions, or compilations of music that came out decades earlier. I’d imagine that if you don’t listen to reissues that often, this will be less of an issue for you.
With that fairly significant caveat in mind, we can then look at the breakdown of the plays by month, which again remains incredibly consistent.
Other Echo Nest Data
The above image shows (from top left, clockwise): Valence/Mood, Popularity, Energy, and Acousticness of all my scrobbles in 2017, with the bottom multicolored chart being for Danceability. Notice the direct relationship between the Energy and Acoustic charts – while the majority of the plays had Energy levels above 65, the Acoustic levels stayed low (majority less than about 17). Also take note of the the normal distribution curve of the Popularity metric, while Valence skewed more towards a happier overall mood. Finally, Dance, is yet another normally distributed curve.
The big thing that I wanted to do with this data set was merge it with a daily weather set to see if changes in things like barometric pressure, temperature, or precipitation occurring had a significant effect on the type of music that I listened to. Unfortunately, no matter how I sliced the data, there was no consistent or reasonable conclusion to be found between the two sets. This most likely is because of the fact that 1) there was so much data (on average, about 35 scrobbles/day) that any coincidental correlation was erased, and 2) my listening trends, as stated above, were remarkably consistent over time, regardless of the metric used.
- Contrary to the belief that I listen to a wide variety of music, my taste is extremely consistent in terms of BPM, the ratio of song release dates, and individual song length.
- Other Echo Nest metrics (like dance and popularity) fall around a normally distributed curve.
- There were no discernible conclusions from merging weather data and Echo Nest data together, probably as a result of too much noise in the data itself, and my consistent listening habits.
Thanks for reading! If you have any feedback, comments, or suggestions, please let me know!