2019

Predicting Phish Setlists with Deep Learning

Creative

Deep LearningLSTMSequential ModelingWord2VecEmbeddingsTime SeriesNeural NetworksTransfer Learning

Built a deep learning model to predict what songs the band, Phish, will play next, treating the problem as a sequential multi-class classification task similar to neural language modeling. The goal was to predict the next song given a prior sequence of songs from the band's extensive catalog of 876 unique songs.

Collected comprehensive data from Phish.net API covering all 1,752 shows dating back to 1983, creating a training dataset of ~37,000 samples by concatenating setlists chronologically and encoding songs as integers. Implemented sequence lengths of 25-250 songs to determine optimal context window for prediction.

Developed a recurrent neural network architecture with LSTM cells, embedding layers, and dropout regularization. The model included song embedding layers to learn latent factors about each song, similar to word embeddings in NLP, enabling the model to understand contextual relationships between songs.

Achieved 21.8% accuracy through extensive hyperparameter optimization and transfer learning techniques. Used Word2Vec CBOW algorithm to create improved song embeddings with bi-directional context, then applied these as initialization parameters for the neural network's embedding layer.

The model successfully learned nuanced patterns in Phish's setlist construction, including common segues (like 'Mike's Song' > 'I am Hydrogen' > 'Weekapaug Groove'), set openers/closers, and encore selections. Created a setlist generation tool named 'TrAI' that could recursively predict future setlists based on recent song history.

Artifacts

Blog Post GitHub Repo