Annual Digital Cleanup 4

Posted by JD 01/01/2013 at 15:00

When ever a new year arrives, it is time to do some digital cleanup. Going through old files, new files, old emails, new emails, archiving important things and deleting as much as possible. Heck, if I didn’t respond to that email yet, it probably wasn’t all that important.

Anyway, here’s how I do it, and more importantly, how you can setup your systems to make it easier next year and every year after that.

I won’t pretend to know all about your digital files, so it is unlikely these techniques will work for every situation. Still, I think there is some value with just a little organization.

Digital Collections

Just like a business, people at home have a tendency to acquire digital things during the year that should be filed, categorized, processed, stored and backed up. But only for a certain period of time.

  • Photos
  • Music
  • Videos
  • Bank Statements
  • Brokerage Statements
  • Electronic Bills and other statements
  • Assorted ebooks
  • Passwords
  • other random files … like I have blog articles and other written articles.

How to organize all these things to we can

  1. find them later
  2. know what they are quickly
  3. understand how recent (or old) they are
  4. most importantly know when they can and should be deleted

Date Sensitive or Not?

Out of all those items, for some the date matters. For others, I plan to keep them forever and pass them onto my family after my death. It really comes down to things that can and should be deleted later that need just a slight amount of organization.

What does not need a date-based organization? Simple.

  • Music, Videos (TV, DVDs, etc), eBooks

What does need a date-based organization?

  • any business statement, letters, emails, home movies and photographs

Huh? Photographs and home movies? Why do those need dates? In 30 years, the people in those photographs may fade, but if the date is known, that will help place the media properly. Trust me, I’ve been going through my parent’s old videos, slides and photographs for years now. Having an approximate date really does help set the time and place of the photos.

Organization Techniques

I’ve written about this before, so I won’t bore you again. Basically, I perform the minimal organization needed, just like I do with papers.

Keep reading for the other techniques.

Organizing Emails

Email is one of those things that we all have and few people handle well. For the last decade, I’ve received between 100-300 emails every weekday for work. Most would be project related, so they really could not just be deleted. After working a few hundred different projects, it became clear that I needed a better organization technique for email. Here is what I do now.

First, emails are filed by Project. That could mean into an email folder or just with a tag that the email system supports. I treat both the same.

Second, emails are filed by Year. Project folders fit into a Year folder. Huh? Well, that project from 12 years ago will not be very important today. Trust me. Without having the year in the archive structure, you’d vaguely recall how long ago it was and may need to look inside a message. With the year – just the year – you have a tremendous amount of extra information. Without the year, you are left to figure it out.

Trust me, use the year/ folder method.

Just for clarity, I do will not create a 2013 folder today. Today, I’m making a 2012/ folder and moving all non-archived emails and project folders into it.

For projects that are active for multiple years, I keep those together and do not archive the emails until the year that the project ends OR when the project changes into long-term maintenance mode. I hope that makes sense. Deployment is a different project from maintenance.

Video/DVD Organization

I have two different organization methods for online and offline media.

For online media, I follow the XBMC requirements. those basically come down to these directory structures:
Music/{genre}/{Artist}/{Album}/{track-number}-{Title}.{ext} Movies/{Title} TV/{Title}/{season}/{Title}-{episode number}-Show Title}

For offline media, I use a numbered media method and store a list of file names into a catalog directory. This catalog directory can be searched quickly to find which specific offline media, by number, a particular file is located. Basically, I use an `ls -lR` method that has been around for 30+ years in the DVD and CDROM archival days. Rereading that it isn’t clear.

I have a directory. It holds hundreds of files with names that are numbers.
001.txt, 002.txt, …. 452.txt Clear?
Inside each of those files is the ls -lR > 342.txt for the specific DVD. A list of the file contents. In PowerShell, I would use dir -R > 342.txt or on cmd.exe, I would use dir /s > 342.txt to create the file. When I burn the DVD, I write “342” on the disc and put it into a DVD organizer in order. The order is critical. We can all count, so finding DVD 269 is extremely easy later. It also means that mixing contents is not a big deal, though I do try to place a complete season of recordings onto 1 or 2 DVDs if I can.

I hope that is clear.

I remember getting a Linux CD from Walnut Creek every 6 months. In the root directory of that CDROM, there is a file named ls_lR.txt which contained a list of every file on the disk.

The use of numbered media really is very freeing. It works for DVDs, CDROMs, Data BluRay and even HDDs. A properly maintained HDD should hold data for years between refreshing, but that refresh is critical from time to time. Bit rot on HDDs is real, just like it is real on optical media.

eBooks

eBooks are tiny, so it is easy to forget to back them up in a few different locations. I intend to give my ebook collection to my family when I die. After all, I bought the books, right?

The first thing I do is to remove any DRM on a purchased ebook. I don’t want to be stuck not having access to it later just because I switch hardware or operating systems or the publisher or distributor have a legal disagreement.

Some people have had purchased ebooks removed from their eReaders just by traveling to a different country. Crazy. Avoid all that by removing the DRM immediately. I won’t tell you how, but google will explain all. calibre is the easy-to-use eBook organization software that I use. It is cross-platform and supports all types of ebook formats. It uses an {Author}/{Book Title}/ directory structure and converts between lots of different ebook formats. I use the PDF-to-epub conversion a bunch.
For example, the Modern Perl book by chromatic is released in PDF form. Calibre stored it here: /Library/chromatic/Modern Perl/ with the PDF and epub document versions stored in the same folder.

Almost all of my ebooks are 100% free versions, so I don’t need to worry about DRM too often.

The main thing is to have your ebook collection in at least 2 different places and remove the DRM so you aren’t prevented access later.

If you don’t know why DRM is evil – read these articles.

I should also be clear. I do not share these non-free ebooks with others. Removing the DRM is just for personal use so I can use the content on unsupported devices.

Preventing File Corruption

Some files are really important. Important enough that losing them is just, well, scary. For small files, that means having them backed up in multiple places, on multiple media – disks, CDs, DVDs, and remote disks at Mom’s house or in a safe deposit box or with a trusted friend across the country. For example, I keep my KeePassX database in about 10 different places. Losing it would be catastrophic. Terrible. Unthinkable. Every night, an automatic tiny script pushes the current version of the file to all those places for me, at least it tries. Sometimes it fails – like the push to the Android Tablet. It isn’t always on and isn’t always running the sshd process. Having the absolute latest version of the DB isn’t critical to me unless a really, really, really important password was changed. That doesn’t happen very often at all.

For larger files that are not practical to have stored in more than 1 location, like home videos, using parity to help recover any lost bits is important.. I hope that link was clear. More and more, I’m needing to use those recovery techniques to recover time-shifted TV recordings on 8 yr old DVD media.

Backups

All of this organization doesn’t mean anything if we don’t have good backups. Most people don’t bother with backups until they experience a data loss themselves. Anyway, this is another reminder to backup your data so you are not sorry later.

RAID is not a backup. That had to be said. ;) Files stored on RAID disks still need to be stored on backup media too. Best Practices for Backups.

Games?

I do not game enough to worry about backing up game save states. Sorry. Hopefully a commenter will have a good solution.

I do own a PS2, but it has not been powered on since sometime in 2007.

What Is Missing Here?

I must have missed something. What is it?
If you have a technique that works for you, please share.

Happy 2013 to you all! I’m off to do my digital clean up now. It should take just a few hours at most.

  1. INTPJavaGirl 01/07/2013 at 00:04

    Your list seems pretty comprehensive.
    I have projects that span multiple years, so I tend to do project/year instead of the other way when archiving.

    I’d like to complain about the music layout standards though. The industry seems to want to force the artist/album structure, or at least most apps I’ve used have. 75% of my music is soundtracks from musicals. Different artists or mixes of artists sing the songs on each tracks. When I scan my CDs or move it into a new library, they invariably get put into artist/album structure, putting each song into it’s own folder. Not good. I’ve had to manually edit the meta data to wipe the artist info to get it to go under “various” as the artist and then have the album soundtrack under that. I hate that I have to loose the info on who sung the song in order to keep the album together and the tracks in a playable order by album. Itunes (yeah- I know) used to support a “soundtrack” setting that would let songs be group together, but that stopped working a few upgrades ago. Other SW has not let me do it at all without the manual edits. Do you or any of your readers have a suggestion for ditching the “Artist” folder layer automatically?

  2. JD 01/07/2013 at 14:40

    @JG

    I don’t use iTunes. My music organization is 100% manual.
    I have both a Compilations/ and Themes/ folders. Themes/ are for movies, TV shows, etc…. Compilations are for “Best of the 60s” types of things.

    This is just on disk. The artists should still be listed inside the MP3 file metadata, so that information isn’t lost.

    After I converted 1000+ audio CDs into MP3s years ago, I really haven’t bothered to be too concerned about organization.

    Seems to me that using iTunes is the biggest issue for many people who don’t like Apple’s implementation of organization. Why not just use a different program to manage this all for you? Certainly Apple isn’t the only people making software to work with all their hardware devices …. or are they?

    If I want to remove the artists level from my hierchy, I’d just write a script to go into each {Genre}, build a list of directories, and move every file below those directories into the current one. That wouldn’t be a good idea for me, since below artist, I have album and it is probable that multiple artists have the same album title. Collisions are bad.

    Also, since having too many entries inside a single directory seems to lead to directory corruption, I like the idea of having only 10-30 files in any directory at most. I’ve seen people with 1000+ songs in a single directory be hit with a corrupt entry so all their files were lost. Of course, they didn’t have backups. Most people do not.

    Ok, so hopefully someone with more iTunes experience (or any sympathy towards fellow iTunes users) will comment with a solution. You know how I feel. You bought the Apple stuff, do what they like or dump it used on eBay or Craigs List to some other sucker, then buy something that is more open where more competition is possible.

    I should also admit that my portable music players are

    • Android (SDHC memory)
    • Creative Vision:M (for 2005-ish)
    • Nokia N800 (SDHC memory)

    I had a Dell DJ before the Creative, but 15G just wasn’t enough storage. All these let me organize by my manual folders on my server. What the playback software does on each in terms of organization depends. Under the N800 and Android, I tend to use file organization tools to select playlists, not the music players. The Creative device reads all the metadata as files are loaded and has an internal DB with all the usual fields – genre, artist, album, track … It allows my folder structure, but playback ignores it completely.

    On Linux when I want to playback something, I’ll usually wipe up a quick list of files into a randomized m3u file, then use xmms2 or audacious for playback. Without folders, doing this would be hard.

  3. JD 01/08/2013 at 13:45

    I had another though, but don’t know if this works on newer Windows OSes.

    Use hard links to place the files in multiple directories. This will use another inode, but no additional disk space. I do this all the time for TV movie recordings. I place a movie into the directory structure that XBMC expects AND I also place it into a holding area until it is burned to optical media. The caveat is that both places must be on the same file system, but with those new HDDs that I added in the fall, I have plenty of room … for now.

    $ cp -rl dir1 dir2

    is the command that I use. XBMC wants movies inside their own directory.

    Anyway, just a thought for how you can have a single file in multiple places without wasting 2x the storage.

    I do not know if file links as implemented in Windows works at the file system level or just in the Explorer program. I do know that cmd.exe and powershell do not show these links as expected.

    I don’t know if the OSX file system supports hard links either.

  4. INTPJavaGirl 01/11/2013 at 17:34

    I have never been able to get the Windows links to work as expected, but I tend to want them to behave like Unix links.

    Unfortunately even when I have the file dir the way I want it, the music tool I use (evil iTunes) moves it all around again after I import. Previous versions would at least preserve the folder structure if I did it that way.
    Really I’d like to be able to rip the CDS in the format with no artist in the folder name. Once I get it that way, it imports nicely into my Andriod MP3 player at least. And as far as I know there is no 3rd party alternative to iTunes. It manages the synching of devices and the mobile library in addition to being a music player.