Now that Apple has announced their iCloud service, I thought I'd jot down some thoughts on music storage in the cloud -- that is, on some company's central server instead of on your PC, MP3 player, and other devices you own.

tl;dr - It's a great shortcut for the music industry's new business model of suing consumers en masse.

If you're not familiar with this topic, see these. (most not available in Canada yet, but for reference...)

Amazon Cloud Player (damn, Amazon has crappy URLs)
Google Music Beta
Apple iTunes Cloud

First, let's look at some underlying technology. Stay with me, it gets interesting in a bit. ;)

One central principle of any system where many people are storing the same thing is deduplication -- that is, when you upload a file and someone else uploads the same file, only one copy of the file itself is actually stored, and the tracking system has two records pointing to the same file. Any other means of setting up a system like this would require ridiculous amounts of storage space, and whatever else Amazon, Google, and Apple are, they are not stupid.

One way to do this is through the storage layer itself, using something like Opendedup or various similar commercial solutions. But if you have everyone uploading all their files through a common interface, as all these cloud systems do, it'd be simple enough to just do a quick checksum on the file (preferably before upload to save time/bandwidth) and match it to the checksum of any existing files in the system. If the file exists already, just add another pointer to that file. If it doesn't exist, upload the new one. This is certainly what Amazon and Google are doing. In the case of a file matched with iTunes Match, you don't even upload a file, they just point to Apple's official AAC version of the same track.

This means that any cloud-based system that's charging you based on the "size" of similar files that many people upload is a scam. After the first person uploads a file there is no reason for additional storage to be used when a second, third, or one millionth person uploads the same file, aside from the space used for the file tracking system.

It also means that any cloud system can quickly and simply generate a list of every user who has uploaded the same file. More on that later.

Thinking on iTunes Match, I'm not entirely sure how that system would match tracks with random MP3 files (let's assume you ripped them from a CD using something besides iTunes, and didn't 'acquire' them using other means). I would think ripped MP3 files would not be generated exactly the same on every different PC and not easily identifiable as a specific audio track by checksum, and Apple certainly can't trust the easily manipulated metadata within the MP3 file for identification. It seems likely that iTunes Match may only match music ripped using iTunes itself, which associates additional metadata from the physical CD with the music files, or that the service can identify by checksum as an authorized download. Going back to Apple's information, at no point does the iTunes Match announcement discuss matching "random MP3 files you acquired from somewhere", just "songs you've already ripped yourself" or "bought from another online store".

It's possible that the technology Apple acquired from Lulu might have some kind of magical audio scanning system that looks at the actual contents of an MP3 file and compares it to audio tracks in a central database, but that seems like a lot of work. Far easier for Apple to just restrict sources of files to ones they can easily identify.

Now, let's move on to tracking and privacy.

If you have a bunch of MP3 files on your PC and audio playing device of choice there are very limited ways that anyone can authoritatively find out what they are, for example if they wanted to initiate legal actions alleging you stole music (whether or not you really did). If you're using iTunes to manage the files on your device then Apple likely has some data collection options, but there are still privacy issues with them scanning your library and collecting specific details about the contents (though I have to admit I haven't read the terms & conditions on iTunes lately, so maybe they're working around that). If you are actively sharing files that's different, there are means of collecting data across the network about what you are doing, but if you just have a bunch of files sitting on your machine and are not sharing them you're not likely to be hit up with a lawsuit.

However, if you are uploading files to a centralized storage service (for music or any other files), you have effectively handed over any and all expectations of privacy about your files to whoever is running that service. This is especially the case when the storage service is specifically intended for content that is illegal when duplicated without authorization and licensing -- like, say, music.

In and of itself this doesn't seem particularly bad -- I'm sure you wouldn't be actively torrenting music and then uploading to iTunes Cloud. That would just be dumb. But it's entirely possible that a friend could give you a MP3 of a music track that he likes, and you have no idea where it comes from or who else has a copy, and you upload it with all the rest of your ripped files. Maybe you rip a track to MP3 from your own CD, give a copy to a friend, and unknown to you their kid gets a copy and puts it out on a torrent.

Then you upload the file to the cloud, and suddenly you're added to a list of everyone else using the system who has uploaded that file, tracked by it's unique checksum fingerprint. It would be foolish to think that Apple and the others haven't made agreements with the music labels to review those lists, perhaps even as they're trending, and hand them over for legal action if the file is found to be an unauthorized copy (ie, not one that the music labels provided to an authorized reseller). I'm sure there will be details in the terms and conditions of use allowing them to do this, there are usually clauses about illegal activities rendering all privacy protections void.

I wouldn't be surprised that the music labels would even release versions of new tracks via the illegal torrent scene that are specifically intended to generate these lists, allowing them to take action much more quickly, because they will already have proof of the origin of the file. This is already possible, but there's a lot of legwork involved now that makes it prohibitive -- tracking a downloader, forcing their ISP to divulge their contact info, collecting all the downloaded files after getting a court order to seize their PC(s), then doing detailed computer forensics work to compare the files to the original, all while preserving the chain of evidence. The cloud service removes this requirement -- the list of users who have uploaded an unauthorized file is readily available.

So... yeah. Cloud based music services providing quick and accurate evidence for music industry lawsuits.

But remember, it's all about convenience for users!