Optical Data Recovery Technique with ddrescue and par2

Posted by JD 06/12/2011 at 07:00

Many of us backup important data to optical disks like CDROM or DVD media. Over time, that media is known to fail. This means that every 5-10 years, a plan to migrate all the critical data to newer media needs to be included. It also means that when data is stored to this type of media, steps should be taken to protect the data. Recently, I had a need to pull some data, old family movies, from a DVD. The movies were stored as xvid/mp3 data inside an AVI container. Anyway, after loading the disk onto a network drive, the movie began playing, then abruptly stopped about 2 minutes into the hour long movie. I have other copies on other media … somewhere, but this would be a good opportunity to try a contingency plan that I’ve been using for at least 10 years.

Read more below.

Use Parity Files When Storing Data

All data that I place onto optical media gets parity files with it. Sure, it takes a little time and a little more storage, but when there’s a failure, there is also a chance to get the data back. The par2 program provides this for us.

Here’s a tiny script that creates parity files like we need:

#!/bin/sh

for filename in “$@”; do

\# Create a 10% recovery data with blocksize of 300KB nice par2 create -s307200 -r10 “$filename”

done

The parity files are placed in the same directory as the source file if you use that script. Here’s what we have on the DVD media:

-rw-r--r-- 1 ui ui 948894968 2003-01-28 19:32 Family_Movie_1972.avi
-rw-r--r-- 1 ui ui     62204 2003-01-28 19:18 Family_Movie_1972.avi.par2
-rw-r--r-- 1 ui ui    369472 2003-01-28 19:18 Family_Movie_1972.avi.vol000+001.par2
-rw-r--r-- 1 ui ui    738844 2003-01-28 19:18 Family_Movie_1972.avi.vol001+002.par2
-rw-r--r-- 1 ui ui   1415484 2003-01-28 19:18 Family_Movie_1972.avi.vol003+004.par2
-rw-r--r-- 1 ui ui   2706660 2003-01-28 19:18 Family_Movie_1972.avi.vol007+008.par2
-rw-r--r-- 1 ui ui   5226908 2003-01-28 19:18 Family_Movie_1972.avi.vol015+016.par2
-rw-r--r-- 1 ui ui  10205300 2003-01-28 19:18 Family_Movie_1972.avi.vol031+032.par2
-rw-r--r-- 1 ui ui  20099980 2003-01-28 19:18 Family_Movie_1972.avi.vol063+064.par2
-rw-r--r-- 1 ui ui  39827236 2003-01-28 19:18 Family_Movie_1972.avi.vol127+128.par2
-rw-r--r-- 1 ui ui  16965196 2003-01-28 19:18 Family_Movie_1972.avi.vol255+054.par2

Copy the Data to a Hard Disk

We know that watching the movie off the DVD media doesn’t work, so something is wrong. We assume it isn’t the DVD drive or the computer or the network or the media player. That leaves the optical media, which is usually the issue. Sometimes I wish I’d spent more than $20 per 100 disks. ;)

Initially, I simply use the standard Linux copy command.

$ cp /cdrom/Fami* .

But that listed a few I/O errors even though it continued. In the target directory, everything seemed fine. The files had the correct sizes. With those parity files, I can validate that the important file data truly is the same as was written.
$ par2 r Fami*par2

Causes the parity recovery to happen. In this case, the file did not match AND the amount of corrupted data was too great for the parity files to fix. Ouch. A lesser person may have given up here. Not I. The entire file was 360 blocks and I had 309 blocks – it was just a little short of the 90% needed so that the parity tool could fill in the gaps automatically.

ddrescue To The Rescue

On UNIX-like operating systems, there is a tool called dd. It copies data from source-A to destination-B. The source and destination can be devices like hard disks, tape drives, or files like movie.avi and movie.mkv. There’s a small issue with the stock dd, it doesn’t handle problems very well. Ok, it doesn’t handle problems at all. Whenever dd hits a problem, it stops. For me, that would have been about 2 minutes into the file. Not very useful.

That’s where ddrescue comes in. Ddrescue was designed to handle problems. It skips over problem areas and keeps reading from the input and writing to the output. You end up with an output file that is the same size as reported by the input media file system. Obviously, the contents won’t match since something is wrong, but if I can just recover a few more blocks, then the parity files could fill in the remaining data and that file would be recovered 100% to the original content. SWEET!

Install the gddrescue Package

I’m on Ubuntu. Life is easy:

$ sudo apt-get install gddrescue

Done.

Learn the ddrescue Options

$ ddrescue -h

Show that this tool is a little different from the stock dd command. There’s no need to specify if= or of= Those are assumed by the order of the arguments. There’s a note that recommends running with a log-file specified, but I decide there’s really nothing to lose. I can always get the data off the DVD again later today and try again from the beginning. The help is less than clear why the logfile is useful. Anyway,

$ ddrescue  /cdrom/Family_Movie_1972.avi Family_Movie_1972.avi

and let it run. There’s a nice status display that shows which pass of the operation the tool is currently in and some other stats. It was slower than a copy and slower than the stock dd would have been, but this tool first copied all the data it could, then went back and tried to get the missing data again. It seemed to simply fill in the target with default data when nothing could be retrieved from the source. Fine. But was it enough? If it wasn’t, I’d make use of the logfile that ddrescue recommends and let it try to read the source data many more times.

I attempt the parity recovery again

$ par2 r Fa*2

Do you like how I’m really lazy with typing? That regex matches all the par2 files in the directory. There was enough data filled in and the file was repaired. Par2 rocks!

Summary

None of this recovery would have been possible without the foresight to use parity files on the media. This isn’t the first time that those parity files have helped recover data around here, but it is the first time that ddrescue was needed to get enough of the data off the disk to make recovery-by-parity validation possible.

Watch out for lazy bits on your systems.

Another Corrupted File

This time, I captured all the steps into a log so I could add them to this article.

Step 1 – Try to Copy the File

Seems the next DVD with a TV recording on it also had some corruption. Just like the with the previous file, the error was a simple

IO Error
.

Step 2 – Try to use par2 files to recover damaged part
$ par2 r T*2

par2cmdline version 0.4, Copyright (C) 2003 Peter Brian Clements.

par2cmdline comes with ABSOLUTELY NO WARRANTY.

This is free software, and you are welcome to redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version. See COPYING for details.

Loading "TV-Recording.avi.par2".
Loaded 4 new packets
Loading "TV-Recording.avi.vol015+016.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "TV-Recording.avi.vol003+004.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "TV-Recording.avi.vol000+001.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "TV-Recording.avi.vol063+064.par2".
Loading: 39.8%
Loading: 77.9%
Loaded 64 new packets including 64 recovery blocks
Loading "TV-Recording.avi.vol031+032.par2".
Loading: 71.6%
Loaded 32 new packets including 32 recovery blocks
Loading "TV-Recording.avi.vol007+008.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "TV-Recording.avi.vol127+128.par2".
Loading: 20.9%
Loading: 41.5%
Loading: 60.9%
Loading: 81.1%
Loaded 128 new packets including
128 recovery blocks
Loading "TV-Recording.avi.vol255+040.par2".
Loading: 69.7%
Loaded 40 new packets including 40 recovery blocks
Loading "TV-Recording.avi.vol001+002.par2".
Loaded 2 new packets including 2 recovery blocks

There are 1 recoverable files and 0 other files.
The block size used was 307200 bytes.
Loaded 128 new packets including
128 recovery blocks
Loading "TV-Recording.avi.vol255+040.par2".
Loading: 69.7%
Loaded 40 new packets including 40 recovery blocks
Loading "TV-Recording.avi.vol001+002.par2".
Loaded 2 new packets including 2 recovery blocks

There are 1 recoverable files and 0 other files.
The block size used was 307200 bytes.
There are a total of 2953 data blocks.
The total size of the data files is 906923666 bytes.

Verifying source files:

Scanning: "TV-Recording.avi": 15.0%
Scanning: "TV-Recording.avi": 31.4%
Scanning: "TV-Recording.avi": 46.4%
Scanning: "TV-Recording.avi": 62.8%
Scanning: "TV-Recording.avi": 77.9%
Scanning: "TV-Recording.avi": 92.9%
Target: "TV-Recording.avi" - damaged. Found 73 of 2953 data blocks.

Scanning extra files:


Repair is required.
1 file(s) exist but are damaged.
You have 73 out of 2953 data blocks available.
You have 295 recovery blocks available.
Repair is not possible.
You need 2585 more recovery blocks to be able to repair.

Ok, so there is very little chance that par2 can recover the data when only 73:2953 blocks are available.

Step 3 – ddrescue

This time I use the log file option.

$ ddrescue /cdrom/TV-Recording.avi ./TV-Recording.avi logfile

Press Ctrl-C to interrupt
Initial status (read from logfile)
rescued: 0 B, errsize: 0 B, errors: 0
Current status
rescued: 906923 kB, errsize: 0 B, current rate: 13238 kB/s
ipos: 906887 kB, errors: 0, average rate: 7812 kB/s
opos: 906887 kB, time from last successful read: 0 s
Finished

Ok, so did that work as well as it seemed? There weren’t any errors in the ddrescue log.

$ par2 r T*2
par2cmdline version 0.4, Copyright (C) 2003 Peter Brian Clements.

par2cmdline comes with ABSOLUTELY NO WARRANTY.

This is free software, and you are welcome to redistribute it and/or modify
it under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version. See COPYING for details.

Loading "TV-Recording.avi.par2".
Loaded 4 new packets
Loading "TV-Recording.avi.vol015+016.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "TV-Recording.avi.vol003+004.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "TV-Recording.avi.vol000+001.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "TV-Recording.avi.vol063+064.par2".
Loading: 39.8%
Loading: 77.9%
Loaded 64 new packets including 64 recovery blocks
Loading "TV-Recording.avi.vol031+032.par2".
Loading: 71.6%
Loaded 32 new packets including 32 recovery blocks
Loading "TV-Recording.avi.vol007+008.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "TV-Recording.avi.vol127+128.par2".
Loading: 20.9%
Loading: 41.5%
Loading: 60.9%
Loading: 81.1%
Loaded 128 new packets including
128 recovery blocks
Loading "TV-Recording.avi.vol255+040.par2".
Loading: 69.7%
Loaded 40 new packets including 40 recovery blocks
Loading "TV-Recording.avi.vol001+002.par2".
Loaded 2 new packets including 2 recovery blocks

There are 1 recoverable files and 0 other files.
The block size used was 307200 bytes.
There are a total of 2953 data blocks.
The total size of the data files is 906923666 bytes.
Verifying source files:

Scanning: "TV-Recording.avi": 1.1%
Scanning: "TV-Recording.avi": 2.3%
.
.
.
Scanning: "TV-Recording.avi": 97.5%
Scanning: "TV-Recording.avi": 98.7%
Scanning: "TV-Recording.avi": 99.8%
Target: "TV-Recording.avi" - found.

All files are correct, repair is not required.

So ddrescue’s method actually did get all the data after all. Without the par2 files, I wouldn’t have been able to verify the content.

Just so everyone knows, there were file permission issues along the way. When you copy files from DVD media, those files will be read-only on the target too. It is easy to correct that by just giving yourself write permissions as needed

$ chmod u+w *
and all will be good. I had to run that command after the initial copy so ddrescue could write to the file and again when trying to repair the .avi file with par2. Read-only files can’t be repaired, after all.