Monday, April 16, 2012

MP3 Spectrograms

Since MP3s are encoded by frequency-content, shouldn't it be *extremely fast* to convert an mp3 to a spectrogram? Utilities I've seen, so far, (such as sox) seem to convert it to samples first, then back to frequency-content.

---

Been looking into it, a bit. I think the reason this isn't highly-effective is because the actual frequency-content encoded into an MP3 bitstream is something like 32 (or 16?) samples... so even if a DCT is visualizable similar to a FFT spectrogram (which I have yet to find on the 'net), for an MP3 it would only be 32 frequencies tall (including DC), (and *really* wide). I can't quite wrap my head around all this, but I think that's the jist. So, I guess lower frequencies appear DC in these small snippets.

Also, there's something about frequency subbands and long vs short blocks which I can't wrap my head around, either. From what I can piece together from the libmad code, the inverse-DCT seems to be performed directly on small chunks, written directly into an array of samples. I can't figure out where the subbands come into play, nor if they somehow overlap... (which would, I presume, imply "sampleArray[i]+=" instead of "sampleArray[i]=").

This wasn't my original intent, but my original project takes literally 24 hours to process 7GB worth of music... so I guess it was a worthy venture to look into.

No comments:

Post a Comment