25
May

I have discussed a similar product from the same company TestDisk before. A guy just did a test comparing different photo recovery tool. His conclusion is "scalpel is utterly useless when discovering digital photos, while other 4 tools (foremost, photorec, magicrescue, recoverjpeg) achieve similar degree of recovery, with photorec edging out the other 3."

Here is the test I mentioned above.

Some of the photos were secretly deleted from my digital camera by a certain guy few weeks ago. Surely, with valid reason, he didn’t like some of my photos. :D Though I have backup of some of them, other newer photos are not backed up, thus only data recovery can come to my rescue. As a Ubuntu partition is available, I decided to gives the tools mentioned in Ubuntu Data Discovery wiki page some testing, all of which are readily installable with apt-get.

Though these was an existing test between scalpel and foremost, it only compares their speed, but gives no comment on whether software works or not. Data discovery software must try to discover as much evidence as possible, speed is just marketing gimmick when tools don’t work at all.

  • Goal: Try to recover as many jpeg as possible
  • Initial condition: 1GB Memory stick used with Sony T7 camera
  • Content on memory stick: 224 images, 14 thumbnail images, 3 mpeg movies
  • Tools tested:

Indeed the first thing to do is to dump memory stick content into dd image, and perform checksum (I just use MD5) to verify that dd image content is exact copy of original. After that, various tools are run to extract files from the image. Here are the necessary configuration change and commands to invoke:

foremost

Config No change
Command
# foremost -t jpeg -i ~/camera.dd -o ~/recovered/foremost
scalpel

Config Modify /etc/scalpel/scalpel.conf, locate this line:

#jpg     y    200000000    \xff\xd8\xff\xe0\x00\x10     \xff\xd9

Change to:

jpg     y    200000000    \xff\xd8\xff\xe0\x00\x10     \xff\xd9
jpg     y    200000000    \xff\xd8\xff\xe1             \xff\xd9
Command
# scalpel -o ~/recovered/scalpel ~/camera.dd
recoverjpeg

Config No change
Command
# cd ~/recovered/recoverjpeg
# recoverjpeg ~/camera.dd
photorec

Config No change
Command
# photorec /d ~/recovered/photorec ~/camera.dd
magicrescue

Config Use this command to install libjpeg-progs:

# apt-get install libjpeg-progs
Command
# magicrescue -d ~/recovered/magicrescue
   -r /usr/share/magicrescue/recipes/jpeg-exif
   -r /usr/share/magicrescue/recipes/jpeg-jfif
   ~/camera.dd

All generated images are verified with identify (from ImageMagick) to see if they are really valid jpeg files. To simplify testing, I was using the following shell script to verify them automatically:

#!/bin/sh
count_total=0
count_broken=0
count_ok=0
count_thumb=0
broken=
ok=
for i in *.jpg *.jpeg; do
	[ -f "$i" ] || continue
	count_total=$(expr $count_total + 1)
	if identify "$i" >/dev/null 2>&1; then
		count_ok=$(expr $coount_ok + 1)
		ok="$ok $i"
		if [ `stat --format="%s" "$i"` -lt 50000 ]; then
			count_thumb=$(expr $count_thumb + 1)
		fi
	else
		count_broken=$(expr $count_broken + 1)
		broken="$broken $i"
	fi
done
echo "Total $count_total images, $count_broken broken, $count_thumb thumbnail"

After running the above script for each directory of recovered photos, I used image viewer to independently verify that good images contain no artifacts and bad images are indeed not viewable. Here is the result:

Total Broken Thumbnail Full-sized
foremost 248 10 10 228
scalpel 254 238 16 0
recoverjpeg 244 0 16 228
photorec 251 0 16 235
magicrescue 244 0 16 228

Now the single most confident conclusion to make is: scalpel is completely useless for recovering digital images. Most thumbnails are of 3-8KB size range, and all full-sized photos are larger than 1.5MB. Most files recovered with scalpel ranges from 10-25KB instead. One of the most simple explanation is that scalpel simply looks for header and footer sequentially, once it founds footer byte sequence the file is cut off, without checking extracted data is a correct file or not. This 10-25KB range corresponds to the footer of thumbnail within Exif header, instead of real image footer, thus the disaster. Anyway, from this point on, scalpel is completely discard from further testing.

The next comparison is among the remaining 4 tools — how many images are uncovered with each tool. To do this, jpeg file names are normalized first. This is achieved with jhead, using the following command:

jhead -n"%Y-%m-%d-%H:%M:%S" *.jpg

(One of the files, f463.jpg, failed to change name. However other exif readers can interpret its exif header perfectly. No idea.)

There are totally 256 images recovered from all of the tools (or 253, since 3 of them are claimed to be recovered by foremost but those images are broken). 237 images are found with all tools (most of which are not deleted or only deleted recently), that means the other 19 images are only extracted by some of the tools. Here is the list:

Yes Image found Yes Image found but corrupt
Images discovered only with some tools

File Name recoverjpeg
magicrescue
foremost photorec Total
2006-03-14-04:58:37.jpg Yes Yes   3
2007-05-25-00:23:10.jpg Yes   Yes 3
2007-05-25-00:26:14.jpg   Corrupt Yes 1
2008-02-07-11:15:38.jpg Yes   Yes 3
2008-02-08-12:45:02.jpg Yes   Yes 3
2008-02-09-16:50:57.jpg Yes   Yes 3
2008-02-09-16:53:46.jpg Yes   Yes 3
2008-02-09-16:55:57.jpg Yes   Yes 3
2008-05-04-09:00:48.jpg   Corrupt Yes 1
2008-05-05-00:36:01.jpg   Corrupt Yes 1
2008-05-05-05:48:36.jpg   Corrupt   0
2008-05-05-05:59:25.jpg   Corrupt   0
2008-05-07-08:28:25.jpg   Corrupt Yes 1
2008-05-07-08:28:48.jpg   Corrupt   0
2008-05-07-10:25:20.jpg   Corrupt Yes 1
2008-05-07-10:29:59.jpg   Corrupt Yes 1
2008-05-07-10:38:57.jpg   Corrupt Yes 1
f463.jpg     Yes 1

Some interesting observations:

  • recoverjpeg and magicrescue give completely identical result. Not only in term of number of images, but the images found are the same! Wonder if they use the same algorithm? (Haven’t checked yet)
  • Some of the extra images found by photorec are actually NOT deleted! That means recoverjpeg, magicrescue and foremost surely don’t rely on filesystem knowledge at all.
  • Most corrupted images discovered by foremost are of size around 250KB-1MB. Foremost is probably not rigorous enough when determining file fragmentation point.

Leave a Reply