COMPUTER VISION (PART-2): Build a person-profiler application based on gender and complexion (Learn concept of pretrained models)

5 min readAug 16, 2021

GOAL: We will build a simple computer vision application to detect gender of a person based on facial recognition using a pre-trained caffemodel. We will also briefly explain other concepts of pre-trained model like: haar-cascades, tensorflow-zoo, VGG16 ,VGG19,ResNet50,InceptionV3,DenseNet, and NASNet

LINK TO PART-1(Explains the complexion model training and AutoML): https://juniorboyboy2.medium.com/computer-vision-part-1-build-a-person-profiler-application-based-on-gender-and-complexion-using-738f1b631c82

OUTLINE:

Concepts of the pre-trained caffemodel.
haar-cascades
tensorflow-zoo
VGG16 and VGG19
ResNet50
InceptionV3
DenseNet
NASNet
MobileVnet2

1.Concepts of the pre-trained caffemodel.

Caffe (Convolutional Architecture for Fast Feature Embedding) is a deep learning framework that allows users to create image classification and image segmentation models. Initially, users create and save their models as plain text PROTOTEXT files. After a user trains and refines their model using Caffe, the program saves the user’s trained model as a CAFFEMODEL file.You can load a caffemodel with keras.

For example , loading the gender caffe-model used in person profiler application.

2. haar-cascades

OpenCV provides a training method (see Cascade Classifier Training) or pretrained models, that can be read using the cv::CascadeClassifier::load method. The pretrained models are located in the data folder in the OpenCV installation or can be found here. https://github.com/opencv/opencv/tree/master/data/haarcascades

Examples of how to use face and eye haar cascades in python.

3 tensorflow-zoo

it contains open source deep learning code and pretrained models. it is a collection of detection models pre-trained on the COCO 2017 dataset. These models can be useful for out-of-the-box inference if you are interested in categories already in those datasets

They can be found here: https://modelzoo.co/

Some examples are CenterNet HourGlass 104 512x512 , Mask R-CNN,Fastphotostyle, RESNET 50 e.t.c

4. VGG16 and VGG19

In 2014, VGG16 and 19 layer networks were considered very deep (although we now have the ResNet architecture which can be successfully trained at depths of 50–200 for ImageNet and over 1,000 for CIFAR-10).

Simonyan and Zisserman found training VGG16 and VGG19 challenging (specifically regarding convergence on the deeper networks), so in order to make training easier, they first trained smaller versions of VGG with less weight layers (columns A and C) first.

5. ResNet50

ResNet-50 is a convolutional neural network that is 50 layers deep. You can load a pretrained version of the network trained on more than a million images from the ImageNet database [1]. The pretrained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224. For more pretrained networks in MATLAB®, see Pretrained Deep Neural Networks.

You can use classify to classify new images using the ResNet-50 model.

6. InceptionV3

Inception v3 is a convolutional neural network for assisting in image analysis and object detection, and got its start as a module for Googlenet. It is the third edition of Google’s Inception Convolutional Neural Network, originally introduced during the ImageNet Recognition Challenge. Just as ImageNet can be thought of as a database of classified visual objects, Inception helps classification of objects in the world of computer vision.

7. DenseNet

A DenseNet is a type of convolutional neural network that utilises dense connections between layers, through Dense Blocks, where we connect all layers (with matching feature-map sizes) directly with each other. Tensorflow implementation of DenseNet with ImageNet pretrained models.

8. NASNet

NASNet is a type of convolutional neural network discovered through neural architecture search. The building blocks consist of normal and reduction cells.

9. MobileNetV2

MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used. MobileNets can be run efficiently on mobile devices with TensorFlow Lite.

MobileNetV2 is a pretrained model optionally loaded with weights pre-trained on ImageNet, and is very similar to the original MobileNet, except that it uses inverted residual blocks with bottlenecking features. It has a drastically lower parameter count than the original MobileNet. MobileNets support any input size greater than 32 x 32, with larger image sizes offering better performance.

MobileNetV2 was used while training the complexion model for the person profiler application