Skip navigation links

Applications of Pattern Recognition

 

Over the past 40 years, Anil Jain has been working on design and applications of pattern recognition systems. Currently, Jain and his students are devoting their efforts towards three challenging problems: Automatic fingerprint recognition, automatic face recognition and large scale data clustering. Specific projects are summarized below:

Latent Fingerprint Identification
Latent fingerprints are partial impressions of fingers found at crime scenes which serve as crucial evidence to apprehend and convict a suspect. Hence, automatic and accurate comparison of latent prints to rolled fingerprints (exemplars) in the law enforcement databases is critical in forensics. In this research [1], they incorporate feedback from the exemplar to refine the features extracted in a latent fingerprint to improve the identification accuracy. Experiments for this research involved comparing 700 latent prints to 100,000 rolled prints. The HPCC resources allowed them to run their matcher [2] in parallel on 144 single core machines, thereby reducing the comparison time from about 250 days to just about 20 days; a speedup time of ~12x.

Longitudinal study of Face Recognition
Determining the persistence of face recognition over time is an important yet challenging problem. Shape and texture of a human face naturally changes due to aging, leading to some recognition errors in face recognition. In Jain's research [3], they are conducting a large-scale longitudinal study on how facial aging affects the performance of stat-of-the-art recognition systems. Their study utilizes statistical models to analyze the variation in face comparison scores with respect to different covariates such as elapsed time, age, gender, and race. The goal is to determine the trend in face recognition accuracy over time. To obtain reliable parameter estimates for the models, they rely on bootstrapping. Because of the large size of the study (~148K face images of 18K subjects), bootstrapping involves fitting a statistical model to 1,000 random samples (with replacement) of 18K subjects. Fitting each model can take more than 1 hour, so being able to run the 1,000 bootstraps in parallel on HPCC is extremely helpful.

Face Image Clustering
Investigations that require the exploitation of large volumes of face imagery are increasingly common in current forensic scenarios due to the prevalence of surveillance video, as well as the video/image recording capabilities of cell-phones. Effective solutions for triaging such imagery (i.e., low importance, moderate importance, and of critical interest) are not available in the literature. General issues for investigators in these scenarios are a lack of systems that can scale to large volumes of images, say 100M, and a lack of established methods for clustering the face images into the unknown number of identities. As such, they investigate the problem of clustering large database of face images, attempting to group individuals together by identity. The computational requirements for handling large database are quite large; simply extracting descriptive features from 1 million face images could take ~20 hours on a single machine [4]. Aside from feature extraction, computing lists of the most similar individuals for every image in a large database (a necessary condition for some clustering methods) is costly. Typically, a single machine may take on the order of a week to process a single database but, leveraging HPCC resources, this task can be accomplished in less than a day.

Large-scale kernel-based clustering
Every day, massive amounts of data are generated through sensor-equipped devices, websites, social networks and financial transactions. Analysis of such large quantities of data can lead to useful insights and important decisions. Clustering is an exploratory learning technique which can be used to analyze data in an unsupervised manner. In Jain's research, they focus on developing efficient and accurate clustering algorithms which can cluster tens of millions of high-dimensional data points. While kernel-based clustering can achieve high clustering accuracy by using non-linear inter-points similarity, they have high runtime and memory complexity. Clustering data sets containing billions of points using these algorithms would take many weeks and would need petabytes of memory. They have designed efficient approximate variants of these kernel-based clustering algorithms which can cluster such large data sets in a few hours [5]. These algorithms employ random sampling and matrix approximation techniques to reduce the runtime complexity of kernel-based clustering to linear time and reduce memory requirements. "Using the HPCC clusters, we have been able to parallelize our algorithms and further reduce their running time." Jain shares, "For instance, we were able to reduce the time taken by our approximate kernel clustering algorithm to cluster 80 million images form the Tiny image data set [6] from about 9 hours on a single core to just about two minutes, using 100 cores in HPCC. This task would have taken several weeks using the classical kernel-based clustering algorithms."

References:
[1] S. S. Arora, E. Liu, K. Cao and A. K. Jain, "Latent Fingerprint Matching: Performance Gain via feedback from Exemplar Prints", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, No. 12, pp. 2425-2465, December 2014. [2] A. A. Paulino, J. Feng, and A. K. Jain, "Latent Fingerprint Matching Using Descriptor-Based Hough Transform", IEEE Transactions on Information Forensics and Security, Vol. 8, pp. 31-45, January 2013. [3] L. Best-Rowden and A. K. Jain, "A Longitudinal Study of Automatic Face Recognition", 2015. (In Submission). [4] C. Otto, A. K. Jain, and B. Klare, "An Efficient Approach For Clustering Face Images", 2015. (In Submission). [5] R. Chitta, R. Jin, T. C. Havens, and A.K. Jain (2014). Scalable Kernel Clustering: Approximate Kernel k-means. arXiv preprint arXiv:1402.3849 [6] Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958-1970.