We quickly acquire colored 3D models of objects and indoor scenes with a hand-held Kinect camera. We extract SURF features from the camera image and localize them in 3D space using the depth image. We match these features between every pair of acquired images, and use RANSAC to robustly estimate the 3D transformation between them. To achieve real-time processing, we match the current image only versus a subset of the previous images with decreasing frequency. Subsequently, we construct a graph whose nodes c