Music Sampling Exploratory Analysis

Complete project site w/process, code, and visualizations available here

While at the University of Michigan School of Information, I performed an extensive exploratory data analysis project on hip-hop and rap music sampling behaviors, called Project In The Mix, for SI 618: Exploratory Data Analysis. As music sampling is a very important and current hot button issue in the legal definitions of Fair Use and copywrite, developing data explorations of such information is very relevant, especially those concerning sampling behavior. As the practice of uncleared sampling is dubiously legal, it is difficult to gain definite information about song samples present in music, especially in the genres of hip-hop and rap- however, publicly contributed databases have developed online of such information in an attempt to bridge the gap. I located a publicly available website of hip-hop and rap song sampling information, and used Perl scripts to scrub and format the data into a usable database. By running a number of exporatory analyses using real vegas online blackjack R and the visual graphing and exploration package ggplot2, I was able to run analyses, using layering and factorization of data, as well as statistical methods of density charting and more. These tests led to several conclusions about sampling behavior, which I developed at the end into publication-ready graphics, again using ggplot2 and R.  Throughout the process of development, I published all my findings and work on a public website, and also developed some possible future areas of research into song sampling using my database and others.

This project taught me several new programming and statistical languages, as well as the overall power of visualization, and exploration through visualization. In addition, as my first major solo undertaking, I was able to be completely self-reliant on learning all of the necessary skills and languages for the project, which while very difficult and time-consuming, gave me a very broad and useful basis for future data analysis.