| Much like humans, computers are able to infer 3-D positions of objects using sight alone. Replacing our eyes with electronic cameras and our brain with soul-less, heartless algorithms, it is possible for a computer to reconstruct a 3-D scene from simple photos. In order to understand the relative positions of objects in the world, we can take two photos of the same scene (from different viewpoints) and look at how positions of objects change from photo to photo. To illustrate this, try extending your arm and holding it in front of your eye. Look at your thumb with only your right eye, then switch eyes. Your thumb seems to jump a great deal between those two view points, whereas the background stays in mostly the same spot. This change of relative positions allows us to understand the 3-D structure of the things we see. The major difficulty with simulating this on a computer is the fact that its very hard to know where an object is from photo to photo (e.g. a computer doesn't know what a thumb is). The common way to identify the 'correspondences' between photos is to look at the image one small area at a time, and then trying to find a similar patch in the other image. This process is very time consuming and can result in inaccurate correspondences.
The strategy taken by this thesis involves using structured light (projecting a checker board on to the scene while photographing it.) This strategy allows the computer see the image as a binary collection of pixels rather than a multi-valued grey (or color) image. The computer can then give a unique index to each part of the scene and quickly figure out where things are in both pictures. It can then recover the 3-D structure of the scene and paste the surfaces with the original objects' textures.
|