This paper presents a data acquisition system consisting of multiple RGB-D sensors and digital single-lens reflex (DSLR) cameras. A systematic data processing procedure for integrating these two kinds of devices to generate three-dimensional point clouds of indoor environments is also developed and described. In the developed system, DSLR cameras are used to bridge the Kinects and provide a more accurate ray intersection condition, which takes advantage of the higher resolution and image quality of the DSLR cameras. Structure from Motion (SFM) reconstruction is used to link and merge multiple Kinect point clouds and dense point clouds (from DSLR color images) to generate initial integrated point clouds. Then, bundle adjustment is used to resolve the exterior orientation (EO) of all images. Those exterior orientations are used as the initial values to combine these point clouds at each frame into the same coordinate system using Helmert (seven-parameter) transformation. Experimental results demonstrate that the design of the data acquisition system and the data processing procedure can generate dense and fully colored point clouds of indoor environments successfully even in featureless areas. The accuracy of the generated point clouds were evaluated by comparing the widths and heights of identified objects as well as coordinates of pre-set independent check points against in situ measurements. Based on the generated point clouds, complete and accurate three-dimensional models of indoor environments can be constructed effectively.