Paralysis by Analysis

Did not do much to continue my chain yesterday, because it was filled with Father’s Day activities. I know, Father’s Day is tomorrow, but my youngest son is heading to AWANA camp tomorrow and he did not want to miss My Day. We started by seeing Toy Story 3, then lunch at Five Guys, followed by gift unwrapping and Friday night homemade pizza and a movie.

While at Five Guys, we heard a song by Queen, but one that was unrecognized. My daughter pulled out her iPhone and Shazam’d it. If you don’t know about Shazam, it is one of our favorite apps. You use your phone to listen to a song clip and it identifies artist, title, album and other info.

I have been curious as to how their app worked, but I have never searched for an answer before. So, I did. Top Google return was an article at Slate. In it, Shazam co-found and chief scientist, Avery Wang, describes the basics. They also provided a link to his published academic paper at Columbia University describing the process. Here is the abstract:

We have developed and commercially deployed a flexible audio search engine. The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of over a million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained, even on a massive music database.

Wang’s description is that they throw away most of the noise and only use a few peak moments to identify a song. Sounds simple, right? Actually, I think it does. They fingerprint each song in their database and when a user records a clip, a fingerprint is sent the Shazam’s servers for identification. If found, song info is sent back the the cell phone. If not found, Shazam tells you the sample was unrecognized and to try again. Maybe there was too much background noise or you just happened to miss a peak moment in your sample.

I am sure that the algorithm used is not simple, but the concept is simple. Simple concepts are usually the most successful. They are also the hardest to find. Reminds me of the story of the truck that was too tall to go under an overpass and rightly got stuck. While the engineers were trying to determine how to remove it, a little girl in a car stuck in traffic suggested letting air out of the tires. Not sure of the authenticity of the story, but the solution was simple, and obvious after somebody else suggested it.

We, all of us not just the engineers, are not trained in simplicity. We like to use the breadth and depth of our training to make or fix stuff. Our professors do not hand out simple tests. Although my wife tells me that she had a test in high school in which their teacher gave very simple but detailed instructions before handing out the test “Read all of the questions before answering any.” You might see where this is headed, the last test question, “Do not answer any questions, just sign your test and turn in.” From Wikipedia,

The principle most likely finds its origins in similar concepts, such as Occam’s razor, and Albert Einstein’s maxim that “everything should be made as simple as possible, but no simpler”. Leonardo Da Vinci’s “Simplicity is the ultimate sophistication”, or Antoine de Saint ExupĂ©ry’s “It seems that perfection is reached not when there is nothing left to add, but when there is nothing left to take away”.

So, why all of this talk of simplicity and paralysis by analysis. Because, in designing my app, I have been too focused on how I want the final release to look and behave. When I become too focused on the end result, I forget the journey. Jason Fried and David H. Hansen in their book, Rework, tell us to “ignore the details early on.” Start with the big picture and drill down to the details when you need too. An architect does not select floor tiles or kitchen appliances until the floor plan is finalized. I became paralyzed in worrying/studying/analyzing how I want to present the rank and merit badge requirements in the final product instead of getting something working now. I need to more focus on the big picture now, not details.

Accomplishments for yesterday were updating the XML files for each Scout and Merit Badge rank and starting the XML parsing code for those files. Not as much done as I would have liked, but enough because of all of the Father’s Day activities we accomplished. Oh, by the way, the Queen song was It’s Late from News of the World, which just happened to be the first Queen album I bought as a teen.

Thank you Shazam. Now let’s update my Queen collection from iTunes.