Task 2: Cameras and Scene Structure

This phase assumes that calibrated epipolar geometry is established between pairs of images, i.e. the relative translations and rotations are known. For every pair, we can choose the coordinate system origin in the centre of the first camera of the pair. The pose of the second camera is then expressed in this particular coordinate system, but with an unknown scale. Now we need to find the positions and orientations of cameras in a single coordinate system, that is common for all cameras.

Stepwise Camera Gluing

There are several different approaches for solving the problem. We will use the very simple greedy algorithm. The method is recapitulated in lectures (slide Stepwise Gluing). The procedure consists of initialisation of the set of cameras and of the point cloud, followed by repeated appending of a cameras one-by-one.

Initialisation of the point cloud and cameras

The set of selected cameras is empty.
Choose a pair of images I1 a I2.
Find the relative pose and orientation (R,t), choose (nonzero) scale, e.g., choose length of base equal to 1.
Choose the global coordinate system such that it is equal to the coordinate system of the first camera and construct the cameras P1 and P2. Put these cameras into the set of selected cameras.
Reconstruct the 3D point cloud using inlier correspondences between the images I1 and I2 using the cameras P1 and P2 (the points must be in front of both cameras).
Refine the camera set {P1, P2} together with the point cloud using bundle adjustment.

Selection of the Initial Pair

The initial pair is chosen according to some (manually created) heuristics. The pair should contain sufficient number of inlier correspondences (w.r.t. the other pairs), it should be near the center of the captured set.

Appending of a Single Camera

Select an image (Ij) that has not a camera estimated yet.
Find image points in Ij, that correspond to some allready reconstructed 3D points in the cloud (the correspondences need not be 1:1)
Estimate the global pose and orientation of the camera Pj using the P3P algorithm in RANSAC scheme. Implementation of P3P is available (code repository, p3p package) (Notes.pdf).
Insert the camera Pj into the set of selected cameras.
Refine the camera Pj using numeric minimisation of reprojection errors in Ij (updates Pj only).
Find correspondences from between Ij and the images of selected cameras, that have not 3D point yet and reconstruct new 3D points and add them to the point cloud.
~~Refine all the selected the cameras and point cloud using full bundle adjustment.~~
Repeat from beginning or terminate if all cameras are computed.

Selection of Appended Camera

Again, the camera is selected according to manually chosen criterion. The image being appended should contain sufficient number of correspondences into cloud of already reconstructed 3D points. E.g., it is suitable to select the image with the highest number of such correspondences.

Correspondence Manipulation for Gluing of Cameras

During the step-wise gluing, the pair-wise image-to-image correspondences must be properly treated and transformed to image-to-3D-cloud correspondences. An implementation of correspondence manipulation algorithm is available (code repository, corresp package). The algorithm (including the API of the package) is described here.

Export of Cameras and Points into PLY

A simple Matlab/Python class is available in the code repository (geom_export package). Example of use:

import ge
g = ge.GePly( 'out.ply' )
g.points( Xall, ColorAll ) # Xall contains euclidean points (3xn matrix), ColorAll RGB colors (3xn or 3x1, optional)
g.close()

Task

Compute cameras and sparse point cloud using step-wise gluing with proper numeric optimization.
Show 3D plot of cameras (centres with viewing directions) and 3D points. Emphasize the camera pair used to initialize the gluing. Label the cameras with numbers according to order of gluing. See example in figure 1.
Export the cameras and point cloud into PLY file.

Fig. 1: Example of reconstructed set of cameras. Black needles shows viewing directions, initial pair is emphasized in red.