The role of priors in visual perception and their applications in computer vision
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Three-dimensional (3D) vision is an ill-posed inverse problem. The formation of the 2D image of a 3D shape/scene is the forward problem, and inferring the 3D shape/scene from the image is the inverse problem. The ill-posedness is related to the fact that any given 2D image is consistent with infinitely many 3D interpretations. In order to produce a unique and ideally correct interpretation, one has to impose constraints (aka priors) on the family of possible interpretations. Symmetry, compactness, minimal surface area, planarity etc., are some priors used by the visual system to deal with the ill-posedness of the problem. In the first part of this dissertation, a psychophysical experiment conducted to better understand how these priors operate in the visual system is discussed. In the second part, a method that uses some of these priors to recover 3D shapes from a single image is described. And in the last part, a translational symmetry based algorithm to extract curve skeletons from 3D point clouds by decomposing the point clouds into its parts is presented.
Prior studies have found that, the perception of symmetric abstract polyhedral shapes, can be well modeled using the above mentioned priors and binocular depth order information. In this study, it is shown that these priors can be used to model asymmetrical shapes obtained from affine distortions of symmetric shapes. The experiment shows that the perception of symmetrical shapes is closer to veridical in comparison to asymmetrical shapes. Metrics to measure asymmetry of abstract polyhedral shapes and to measure shape dissimilarity between two polyhedral shapes are introduced. A control experiment which proves the goodness of the model is also presented. A website was developed with all the shapes used in the experiment, along with the user reconstructed shapes and the model reconstructed shapes.
To recover 3D shapes from a single view, symmetry and planarity constraints are used. Long smooth curves are extracted from the edge map of an image by solving the shortest (least-cost) path problem, where the cost function penalizes large interpolations and large turning angles. Optimal curve matches, that minimize the number of planes required to approximate the final 3D reconstruction, are then found. This optimization problem is framed as a binary integer program.
To extract curve skeletons from 3D point clouds, the cloud is decomposed into its parts. Generalized cylinders (GCs) are used to represent parts. Since, the axis of a GC is an integral part of its definition, the parts have natural skeletal representations. Cross-sections of parts are first detected and parts are then grown starting from this initial cross-sections. Translational symmetry, the fundamental property of GCs, is employed to grow the parts. A large number of such candidate parts are grown starting from different positions in the point cloud. Each part is assigned a score based on how well these parts can be represented as a GC. An optimization algorithm is then employed to select the best subset of parts, from within the candidate parts, to represent the decomposition of the object.