Selected research projects:

  • Boosting 3D LBP-Based Face Recognition by Fusing Shape and Texture Descriptors on the Mesh
    In this paper, we present a novel approach for fusing shape and texture local binary patterns (LBPs) on a mesh for 3D face recognition. Using a recently proposed framework, we compute LBP directly on the face mesh surface, then we construct a grid of the regions on the facial surface that can accommodate global and partial descriptions. Compared with its depth-image counterpart, our approach is distinguished by the following features: 1) inherits the intrinsic advantages of mesh surface (e.g., preservation of the full geometry); 2) does not require normalization; and 3) can accommodate partial matching. In addition, it allows early level fusion of texture and shape modalities. Through experiments conducted on the BU-3DFE and Bosphorus databases, we assess different variants of our approach with regard to facial expressions and missing data, also in comparison to the state-of-the-art solutions.
    N. Werghi, C. Tortorici, S. Berretti, A. Del Bimbo, "Boosting 3D LBP-based Face Recognition by Fusing Shape and Texture Descriptors on the Mesh," IEEE Transactions on Information Forensics and Security, vol.11, no.5, pp.964-979, May, 2016. [doi]
  • 3-D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold
    Recognizing human actions in 3-D video sequences is an important open problem that is currently at the heart of many research domains including surveillance, natural interfaces and rehabilitation. However, the design and development of models for action recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, clothing and appearance. In this paper, we propose a new framework to extract a compact representation of a human action captured through a depth sensor, and enable accurate action recognition. The proposed solution develops on fitting a human skeleton model to acquired data so as to represent the 3-D coordinates of the joints and their change over time as a trajectory in a suitable action space. Thanks to such a 3-D join based framework, the proposed solution is capable to capture both the shape and the dynamics of the human body, simultaneously. The action recognition problem is then formulated as the problem of computing the similarity between the shape of trajectories in a Riemannian manifold. Classification using k-nearest neighbors is finally performed on this manifold taking advantage of Riemannian geometry in the open curve shape space. Experiments are carried out on four representative benchmarks to demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Comparative results with state-of-the-art methods are reported.
    M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, A. Del Bimbo. "3D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold," IEEE Transactions on Cybernetics, vol.45, no.7, pp.1340-1352, July 2015. [doi]
  • The Mesh-LBP: A Framework for Extracting Local Binary Patterns From Discrete Manifolds
    In this paper, we present a novel and original framework, which we dubbed mesh-local binary pattern (LBP), for computing local binary-like-patterns on a triangular-mesh manifold. This framework can be adapted to all the LBP variants employed in 2D image analysis. As such, it allows extending the related techniques to mesh surfaces. After describing the foundations, the construction and the main features of the meshLBP, we derive its possible variants and show how they can extend most of the 2D-LBP variants to the mesh manifold. In the experiments, we give evidence of the presence of the uniformity aspect in the mesh-LBP, similar to the one noticed in the 2D-LBP. We also report repeatability experiments that confirm, in particular, the rotation-invariance of mesh-LBP descriptors. Furthermore, we analyze the potential of mesh-LBP for the task of 3D texture classification of triangular-mesh surfaces collected from public data sets. Comparison with state-of-theart surface descriptors, as well as with 2D-LBP counterparts applied on depth images, also evidences the effectiveness of the proposed framework. Finally, we illustrate the robustness of the mesh-LBP with respect to the class of mesh irregularity typical to 3D surface-digitizer scans.
    N. Werghi, S. Berretti, A. Del Bimbo. "The mesh-LBP: a Framework for Extracting Local Binary Patterns from Discrete Manifolds," IEEE Transactions on Image Processing, vol.24, no.1, pp.220-235, January 2015. [doi]
  • Face Recognition by Super-Resolved 3D Models From Consumer Depth Cameras
    Face recognition based on the analysis of 3D scans has been an active research subject over the last few years. However, the impact of the resolution of 3D scans on the recognition process has not been addressed explicitly, yet being an element of primal importance to enable the use of the new generation of consumer depth cameras for biometric purposes. In fact, these devices perform depth/color acquisition over time at standard frame-rate, but with a low resolution compared to the 3D scanners typically used for acquiring 3D faces in recognition applications. Motivated by these considerations, in this paper, we define a super-resolution approach for 3D faces by which a sequence of low-resolution 3D face scans is processed to extract a higher resolution 3D face model. The proposed solution relies on the scaled iterative closest point procedure to align the low-resolution scans with each other, and estimates the value of the high-resolution 3D model through a 2D box-spline functions approximation. To evaluate the approach, we built— and made it publicly available—the Florence Superface dataset that collects high-resolution and low-resolution data for about 50 different persons. Qualitative and quantitative results are reported to demonstrate the accuracy of the proposed solution, also in comparison with alternative techniques.
    S. Berretti, P. Pala, A. Del Bimbo. "Face Recognition by Super-resolved 3D Models from Consumer Depth Cameras," IEEE Transactions on Information Forensics and Security, vol.9, no.9, pp.1436-1449, September 2014. [doi]
  • 3D face recognition in the presence of large pose variations
    In this work, we propose and experiment a 3-D face recognition approach capable of performing accurate face matching also in the case where just parts of probe scans are available. This is obtained through an original face representation and matching solution that first extracts keypoints of the 3-D depth image of the face and then measures how the face depth changes along facial curves connecting pairs of keypoints. Face similarity is evaluated by sparse comparison of facial curves defined across inlier pairs of matching keypoints between probe and gallery scans. In doing so, a statistical model is also proposed to associate facial curves of the gallery scans with a saliency measure so that curves that model characterizing traits of some subjects are distinguished from curves that are frequently observed in the face of many different subjects. Following recent related work, the recognition accuracy of the approach is experimented using two datasets, both comprising scans with missing parts: the Face Recognition Grand Challenge v2.0 dataset combined with the University of Notre Dame probes; the Gavab dataset.
    S. Berretti, A. Del Bimbo, P. Pala. "Sparse Matching of Salient Facial Curves for Recognition of 3D Faces with Missing Parts," IEEE Transactions on Information Forensics and Security, vol.8, no.2, pp.374-389, February 2013 [pdf].
  • Ethnicity based 3D face recognition
    Among different approaches for 3D face recognition, solutions based on local facial characteristics are very promising mainly because they can manage facial expression variations by assigning different weights to different parts of the face. However, so far a few works have investigated the individual relevance that local features play in 3D face recognition, with very simple solutions applied in the practice. In this paper, a local approach to 3D face recognition is combined with a feature selection model to study the relative relevance of different regions of the face for the purpose of discriminating between different subjects. The proposed solution is experimented using facial scans of the \textit{Face Recognition Grand Challenge} dataset. Results of the experimentation are twofold, in that they quantitatively demonstrate the assumption that different regions of the face have different relevance for face discrimination, and also show that the relevance of facial regions changes for different ethnic groups.
    S. Berretti, A. Del Bimbo, P. Pala. "Distinguishing Facial Features for Ethnicity based 3D Face Recognition," ACM Transactions on Intelligent Systems and Technology, special section on Intelligent Multimedia Systems and Technology, Part II, ACM, vol.3, no.3, article 45, 20 pages, May 2012. [pdf]
  • 3D face recognition
    In this research, we present a novel approach to 3D face matching that shows high effectiveness in distinguishing facial differences between distinct individuals from differences induced by non-neutral expressions within the same individual. The approach takes into account geometrical information of the 3D face and encodes the relevant information into a compact representation in the form of a graph. Nodes of the graph represent equal width iso-geodesic facial stripes. Arcs between pairs of nodes are labeled with descriptors, referred to as 3D Weighted Walkthroughs (3DWWs), that capture the mutual relative spatial displacement between all the pairs of points of the corresponding stripes. Face partitioning into iso-geodesic stripes and 3DWWs together provide an approximate representation of local morphology of faces that exhibits smooth variations for changes induced by facial expressions. The graph-based representation permits very efficient matching for face recognition and is also suited to be employed for face identification in very large datasets with the support of appropriate index structures. The method obtained the best ranking at the SHREC 2008 contest for 3D face recognition. We present an extensive comparative evaluation of performance with the FRGC v2.0 dataset and the SHREC08 dataset.
    S. Berretti, A. Del Bimbo, P. Pala. "3D Face Recognition using iso-Geodesic Stripes," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32, no.12, pp.2162-2177, December 2010. [pdf]
  • 3D mesh partitioning
    Partitioning of complex 3D objects into simpler sub-parts is a challenging research subject with relevant outcomes for several application contexts.
    In this research, a model is proposed for decomposition of 3D objects based on Reeb-graphs. The model is motivated by perceptual principles and supports identification of salient object protrusions. Experimental results have demonstrate the effectiveness of the proposed approach with respect to different solutions appeared in the literature, and with reference to ground-truth data obtained by manually decomposing 3D objects. This work was partially done as activity of the DELOS Network of Excellence.
    S. Berretti, A. Del Bimbo, P. Pala. "3D Mesh Decomposition using Reeb Graphs," Image and Vision Computing, Elsevier, vol.27, no.10, pp.1540-1554, September 2009. [pdf]

Some past research projects:

  • Distributed image retrieval
    Searching information through the Internet often requires users to separately contact several digital libraries, use each library interface to author the query, analyze retrieval results and merge them with results returned by other libraries. Such a solution could be simplified by using a centralized server that acts as a gateway between the user and several distributed repositories: The centralized server receives the user query, forwards the user query to federated repositories—possibly translating the query in the specific format required by each repository—and fuses retrieved documents for presentation to the user. To accomplish these tasks efficiently, the centralized server should perform some major operations such as: resource selection, query transformation and data fusion.
    In this work, we report on some aspects of MIND, a system for managing distributed, heterogeneous multimedia libraries (MIND project, 2001). In particular, this paper focusses on the issue of fusing results returned by different image repositories. The proposed approach is based on normalization of matching scores assigned to retrieved images by individual libraries. Experimental results on a prototype system show the potential of the proposed approach with respect to traditional solutions.
    S.Berretti, A.Del Bimbo, P.Pala. "Merging Results for Distributed Content Based Image Retrieval", Multimedia Tools and Applications, vol.24, no.3, pp.215-232, December 2004. [pdf]
  • Modeling 2D spatial relationships
    In the access to image databases, queries based on the appearing visual features of searched data reduce the gap between the user and the engineering representation. To support this access modality, image content can be modeled in terms of different types of features such as shape, texture, color, and spatial arrangement.
    In this research, an original framework is proposed which supports quantitative nonsymbolic representation and comparison of the mutual positioning of extended nonrectangular spatial entities. Properties of the model are expounded to develop an efficient computation technique and to motivate and assess a metric of similarity for quantitative comparison of spatial relationships. Representation and comparison of binary relationships between entities is then embedded into a graph-theoretical framework supporting representation and comparison of the spatial arrangements of a picture. Two prototype applications are described.
    S. Berretti, A. Del Bimbo, E. Vicario. "Weighted Walkthroughs between Extended Entities for Retrieval by Spatial Arrangement," IEEE Transactions on Multimedia, vol.5, no.1, pp.52-70, March 2003. [pdf]
  • Graph-based Image content representation, matching and indexing
    In retrieval from image databases, evaluation of similarity, based both on the appearance of spatial entities and on their mutual relationships, depends on content representation based on Attributed Relational Graphs. This kind of modeling entails complex matching and indexing, which presently prevents its usage within comprehensive applications.
    In this work, we provide a graph-theoretical formulation for the problem of retrieval based on the joint similarity of individual entities and of their mutual relationships and we expound its implications on indexing and matching. In particular, we propose the usage of metric indexing to organize large archives of graph models, and we propose an original look-ahead method which represents an efficient solution for the (sub)graph error correcting isomorphism problem needed to compute object distances. Analytic comparison and experimental results show that the proposed lookahead improves the state-of-the-art in state-space search methods and that the combined use of the proposed matching and indexing scheme permits for the management of the complexity of a typical application of retrieval by spatial arrangement.
    S. Berretti, A. Del Bimbo, E. Vicario. "Efficient Matching and Indexing of Graph Models in Content-Based Retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.23, no.10, pp.1089-1105, October 2001. [pdf]
  • Shape representation and indexing
    An important problem in accessing and retrieving visual information is to provide efficient similarity matching in large databases. Though much work is being done on the investigation of suitable perceptual models and the automatic extraction of features, little attention is given to the combination of useful representations and similarity models with efficient index structures.
    In this work, we propose retrieval by shape similarity using local descriptors and effective indexing. Shapes are partitioned into tokens in correspondence with their protrusions, and each token is modeled according to a set of perceptually salient attributes. Shape indexing is obtained by arranging shape tokens into a suitably modified tree index structure. Two distinct distance functions model respectively, token and shape perceptual similarity. Examples from a prototype system and computational experiences are reported for both retrieval accuracy and indexing efficiency. Shape retrieval has been tested under shape scaling, orientation changes, and partial shape occlusions. A comparative analysis of different indexing structures, for shape retrieval is presented.
    S.Berretti, A.Del Bimbo, P.Pala. "Retrieval by Shape Similarity with Perceptual Distance and Effective Indexing," IEEE Transactions on Multimedia, vol.2, no.4, pp.225-239, December 2000. [pdf]