As much as RE hates news media for being morons, so do I hate tech media for the same problems. Today’s case in point, the Lytro Camera, as summarized by optics.org:
Rarely does a purely optical innovation grab the attention of the world’s media in the way that Lytro’s did last week. The Stanford start-up’s combination of plenoptic technology and slick image processing software will allow refocusing of images after they have been taken – however poorly focused the original image – is a prospect that has got many amateur photographers salivating, even if some of the professionals remain skeptical.
I want to be clear about this right now: You cannot refocus an image after it has been taken. You just can’t. You can sharpen and blur parts of the image to give the illusion of refocusing, but these are just mathematical processes that give estimates. I cannot say this enough: You cannot refocus an image after it is recorded. A breakdown of what the Lytro is doing after the jump. Spoiler alert, the important phrase is “slick image processing.”
I’ll go ahead and give the big reveal now, because what Lytro is doing is actually pretty cool. Let’s say you took several pictures in a row, at different focal lengths (focusing on different parts of the scene, for example). After the fact, you could make a pretty good estimate of any part of that scene that was within the range you sampled. If I focused one foot, two feet, and three feet from the lens, I could give a decent estimate of what it would have looked like if I focused 1.5 feet, for example, by interpolating what I did take and mathematically averaging it out. The problem, of course, is that it takes time to take three shots with lens adjustments and the scene may change between shots. I could do the same thing with three cameras simultaneously, but they have to be in different places because of, you know, mass and physics and stuff. That means that the estimates will be worse, because the values will be different even if the cameras are only an inch apart. This is an important point: that effect is minimal if you’re shooting something further away, so that technique works fine if your target is, say, 30 feet away, but not if they’re 10 feet away. The closer the cameras are to each other, the better their estimates are for closer photos.
What Lytro has done is basically taken a normal photosensor (the chip in a camera that records the light coming in), and turned it into several simultaneous cameras. Photosensor chips have teeny tiny little lenses on them to focus light on each pixel, and Lytro is adjusting those lenses so that, say, columns 1, 5, 9, etc. are all capturing an image at one focal length, columns 2, 6, 10 are at a second, etc. Thus, you’re getting several different images at once (one image composed of the 1,5,9, etc. columns, a second image composed of 2,6,10,etc., and so on). By using alternating columns, Lytro is basically doing the “multiple camera” approach above, but they’ve smooshed the sensors together as tightly as possible, which means you don’t have the same distance problem that I described above.
So, at this point it should be clear why the coverage above is wrong: You cannot refocus an image after you take it. Lytro is taking several images at once and using estimation to average out values in between. It’s not perfect, it’s not a substitute for actual focusing, and it never will be. It’s just the same estimation I wrote about above.
Lytro’s images are small so far, 512×512 (that’s 128,000 pixels, or 1/8 megapixel, which is close to what the very first cameraphones could do. This is 512×512:
Now, it should be clear that the more images you capture at different focal lengths, the better your estimates will be. For something like what Lytro is doing, I’m guessing they need at least four images. Your standard point and shoot camera nowadays is sporting a 10 megapixel sensor, which works out to about 2673×3741 pixels. Once you cut it into four each way, you’ve got 668×935, which is 624,580 pixels or 0.6 megapixels. So, using this technique with a 10 megapixel sensor gives you a “refocusable” image of 0.6 megapixels. (Four was my first guess, but five gives you 534×748, which is pretty damn close to Lytro’s numbers.)
As Ken Rockwell points out at his awesome site:
One needs at least a doubling of linear resolution or film size to make an obvious improvement. This is the same as quadrupling the megapixels. A simple doubling of megapixels, even if all else remained the same, is very subtle. The factors that matter, like color and sharpening algorithms, are far more significant.
So, as long as camera sensors don’t suddenly explode in pixel density (which, for physical reasons, won’t happen), Lytro’s technology is only going to produce small pictures. It’s cool, but it’s not going to replace a real camera.
So, whenever you hear someone say “captures the light field” or “light field camera” or “refocus in post”, I hope you punch them very, very hard.
UPDATE: Reading the optics piece again, I see that they explicitly say Lytro captures four focal lengths, so 512×512 means the camera captures 2048×2048 or about 4 megapixels. But as shown above, to get to a .6 megapixel image, they need to go up to a 10 megapixel sensor, and so on.
UPDATE 2: I completely forgot my second point, which now will get a tiny amount of space. A lot of sites are mentioning Lytro in combination with HDR. They are completely different. HDR has nothing to do with focal length, it’s about adjusting exposure. It’s possible that Lytro does this, but without a shutter mechanism it would have to be done by selectively reading out certain pixel sets at different times. Doable, but not straightforward.