Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Shama Shilpi, Shwetank Aryan
DOI Link: https://doi.org/10.22214/ijraset.2023.55806
Certificate: View Certificate
The method of classifying facial photos or videos into distinct age groups, known as age group classification, is crucial for a variety of industries, including recruitment, security, healthcare, and intelligent social robots. This article offers a thorough methodology for classifying age groups. In order to prepare input photos for feature extraction using deep convolutional neural networks (DCNN), the methodology first preprocesses the images. The DCNN takes the source face image and extracts D-dimensional features from it. A hybrid particle swarm optimization (HPSO) technique is used to choose facial features in order to increase the distinctiveness and recognizability of facial features. The Support Vector Machine (SVM) is then used to categorize the data by age and gender. The importance of these age and gender categories in a dietary recommendation system exemplifies how this research can be used in real-world settings. The system\'s performance is evaluated using real-world photos, and the results show excellent results in terms of prediction accuracy and computing efficiency. Applying evaluation measures to datasets like Adience and UTKface, such as classification rate, precision, and recall, further validates the effectiveness of the suggested method.
I. INTRODUCTION
The science of computer vision and artificial intelligence that uses images to predict age and gender is quite interesting. In order to evaluate photographs and determine the age and gender of the people portrayed in them, it applies sophisticated algorithms and deep learning techniques. In a number of industries, including marketing, healthcare, security, and facial identification, this technology has attracted considerable interest and importance. The capture of a face-containing image is usually the first step in the age and gender prediction procedure. The facial data in this image is then analysed using a number of machine learning models and neural networks to extract characteristics and trends. These models estimate the subject's age and gender using a mix of face landmarks, skin tone, hair, and other visual signals [1, 2].
Age prediction is a challenging attempt because it entails determining a person's chronological age simply on their facial features. Due to elements like ethnicity, way of life, and genetic variances, this can be extremely difficult. On the other hand, gender prediction is comparatively simpler because it concentrates on determining if the subject is male or female based on visual features such the person's facial hair, jawline, and eyebrow shape [3].
There are numerous methods for determining gender based on biological features, mannerisms, and behaviors of people. A person's face can reveal specific details about them, such as their age, gender, expression, mood, ethnicity, etc. In the field of computer vision, image analysis, and artificial intelligence, which classifies gender based on masculinity and femininity, gender identification from a person's face image is a challenging application. An individual is given a gender category as a result of a binary classification issue. One aspect of facial analysis [4, 5] that focuses on categorizing the photos in a controlled context is gender identification. Gender classification is required in an uncontrolled setting, as suggested in [6]. While it is a difficult task for computers, the gender of a person provides supplemental information that helps to obtain information quickly and accurately utilizing human inspection.
There are two approaches to predict face characteristics. 1. Single Attribute Learning (SAL) or Single Task Learning (STL) 2. Multi-attribute learning (MAL) or MTL (multi-task learning). With no links between the different qualities, each attribute/class (such as gender or age) is taught or projected independently using the SAL/STL-based method. In contrast, the MAL (MTL) technique learns several features (for predicting gender and age) using a common parallel model. A person's gender can be inferred from their face, voice, gait analysis (running, jogging, etc.), facial pictures, fingerprints, hand skin, and handwriting. Age can be estimated by anthropology examinations of the face or bones. The face is the most appropriate trait because to its easy visibility (uncovered clothing), collectability, acceptability, and universality. The classic hand-crafted feature engineering methodology and the deep learning-based approach are the two types of state-of-the-art methods for identifying facial age and gender [7–10].
The suggested method places a strong emphasis on age categorization after extracting age-specific features from facial photos. One can determine a person's age by looking for ageing signs in a face image. The age of an adult can also be estimated from skin changes.
The process of accurately determining an individual's age [8] is challenging and is influenced by their gender, race, ethnicity, manner of life, physical traits, and other external influences. Accurate facial age prediction is still challenging since the real age and the anticipated age differ. Several open-source age recognition databases cover the classifications of children, teenagers, young adults, adults, and seniors [11].
II. LITERATURE REVIEW
Any face recognition system should have this crucial feature, and it should operate more quickly and accurately. The main sources of inspiration for face detection algorithms are methods for detecting objects. The generated object proposals are categorized using region-based object detection. Using a classifier, each recommendation is categorized as a face or nonface. A hierarchical multitask training architecture called Hyperface [12] enables the identification of faces, the mapping of landmarks, the prediction of postures, and the identification of gender. Processing by region is quicker. The region proposal network (RPN), a small CNN, is used by R-CNN [13]. It forecasts the boundary of those objects as well as whether or not there will be sliding on the final feature map object. RPN assists in lowering pointless face recommendations and raising their level. Sliding window methods are used to generate face detection at each location in a feature space at a certain scale. The feed-forward convolutional network serves as its foundation. It has a shallow filter that performs detection at various scales and can foresee object classifications. The identification and labeling of facial landmarks is necessary for a number of facial tasks, including facial attribute inference [14], face verification [15], and face recognition [16].
The experimental analysis of facial image categorization was reported by S. D. Sapkal and M. D. Malkauthekar [11]. Face images from two classes and three classes are used for categorization, each with a range of emotions and viewpoints. The Fisher Discriminant method is used to compare the results for two and three classes, and Euclidian distance is used for matching. G. Mallikarjuna Rao et al. [12] introduced a Neural Network-based upright invariant frontal face detection system to identify the gender using facial features. Geometric and pixel-based face features determine accuracy. Categorization is resilient due to the cyclic shift invariance techniques and the pi-sigma neural network.
Neural network-based study on human gender detection was proposed by Golomb et al. [13]. Neural networks [14] were frequently used for feature retrieval and categorization in gender detection. [15] uses backpropagation neural networks to identify gender. Additionally, CNN has now been discovered to be successful at obtaining exclusionary features and differentiating genders [16]. A few of the classification techniques used in visual gender detection are SVM, LDA, and AdaBoost. By calculating ratios of various metrics, Kwon and Lobo [17] proposed a method for categorizing images into discrete age categories based on face features. This approach, however, might not be suitable for images that have a lot of changes in location, lighting, mood, or obstruction. An essential stage in estimating human age is feature extraction. Some of the feature extraction techniques that have been created include the active appearance model (AAM) [18], local binary patterns (LBP) [19], anthropometric features [20], and biologically inspired features (BIF) [21].
An analytical phase-based representation for face identification was proposed by Anil Kumar Sao and B. Yegnannarayna [22] as a solution to the issue of light variation while employing trigonometric functions. We choose the weights to be applied to the projected coefficients based on template-matching eigenvalues. Jing Wu et al. [23] proposed Shape from Shading (SFS) as a method for gender classification. Based on the Principal Geodesic Analysis parameters, linear discriminant analysis (LDA) is used to distinguish between the genders of the test faces. The categorization performance analysis for grayscale face pictures is improved using the SFS technique.
A. Krizhevskyetal. [24] published a research that suggested utilizing a deep convolutional neural network to divide 12 lakh photos into 10 times as many different categories. The findings indicated that supervised learning can produce incredibly accurate results. Annotations on face photos can be found in some datasets, however they are not thought to be useful for face recognition. Although RNN has been employed in certain articles, it is not appropriate for our research because it requires an image as input rather than text or speech, as is the case with RNN. Thus, CNN is preferred over RNN in this study [25]. Unsupervised CNN is also recommended in other works, however the suggested approach in this research makes use of the supervised learning technique and the UTKFace dataset [26].
Young H. Kwon and Niels Da Vitoria Lobo [27] showed graphical categorization using face picture data. The primary facial features—eyes, nose, mouth, chin, almost the top of the head, and sides of the face—are evaluated using ratios to discriminate between young people and elderly people. In secondary feature analysis, the wrinkle index computation is used to distinguish between elderly persons and children and young adults. A mix of basic traits and secondary features on the face distinguish the three categories of infants, adolescents, and seniors.
Wen Bing Horng et al. in [28] proposed a method to classify Age groups. The Sobel edge operator is used to extract face and back-propagation features. Neural networks are used to categorize facial photos into those of infants, children, young adults, and seniors. The Network examines the geometric characteristics of facial photos without wrinkles while detecting newborn images. In the second Network, adults are divided into three categories according to an image's wrinkle characteristics.
S.T. Gandhe et al.'s idea for Face Recognition Using Contour Matching was presented in [29]. The face's shape is taken into consideration when matching face recognition photos. Erno Makinen and Roope Raisamo presented a study on gender classification using automatically identified and aligned faces [30]. SVM with image pixels as input, Multilayer Neural Network with image pixels as input, Discrete Adaboost with Harr-like features as input, SVM with LBP features as input, and SVM with Multilayer Neural Networks are all taken into account as gender categorization algorithms. SVM was able to achieve the highest classification rate when given input of image pixels.
III. METHODOLOGY
The recommended system uses a live camera or a dataset as its input source. Preprocessing prepares it for further processing. The preprocessed image is subjected to DCNN in order to acquire the essential properties. The features are chosen in the following step using hybrid particle swarm optimization (HPSO). Support vector machines (SVM) are used to split a person's age into six categories: "0-5," "6-15," "16-25," "26-45," "46-60," and "60+". There are two classes of gender: male and female. DCNN is used for feature extraction, retrieving distinguishing characteristics to learn important traits. HPSO is used to select the best aspects of the image. When DCNN and HPSO are integrated, the accuracy and computation speed are increased. By using SVM, age and gender are categorized. The model is effective and exceeds the conventional scheme in terms of classification rate, precision, and recall, according to experiments on the Adience dataset and actual photos.
The flow of the methodology is depicted in Figure 1.
Image preprocessing which is initial stage of the present work, it entails a series of procedures and methods performed to unprocessed images in order to get them ready for additional analysis or machine learning activities. In order to improve the performance and accuracy of downstream algorithms, image preprocessing aims to improve the quality of images, eliminate noise, correct distortions, and extract pertinent information.
The next step is Noise removal using a mean filter, also known as a box filter or averaging filter, is a simple yet effective technique to reduce noise from digital images. Within a predetermined kernel or window size, this filter functions by replacing each pixel's value with the average value of its neighbors. The kernel size of 3x3 square is used. When decreasing high-frequency noise in photographs, such as salt-and-pepper noise, the mean filter is particularly helpful. In order to remove noise with a mean filter, one must first choose an acceptable kernel size, which establishes the area of pixels that will be used for each calculation. A compromise between noise reduction and the preservation of image information must be struck when choosing the kernel size, which is crucial. The kernel is then placed over each pixel in the image, making sure that its center lines up with the pixel that is now being looked at. After that, by adding up the pixel intensities within this kernel and dividing by the total number of pixels in the specified neighborhood, the mean value of those intensities is determined. Finally, this computed mean value is used to replace the original pixel value. Every pixel in the image is subjected to this series of procedures repeatedly, producing a smoother version of the image with less noise and better visual quality [31].
A fundamental task in computer vision, face identification and alignment utilizing landmark localization is essential for a variety of applications, from facial recognition to augmented reality, the next stage in the methodology.
The first step in this complex operation is to locate faces in the input image using specialized face detection algorithms like Haar cascades or more sophisticated methods like SSD and Faster R-CNN. After faces are found, identifying distinctive facial landmarks like the eyes, nose, and mouth is a crucial next step. This is accomplished using deep learning models or landmark localization algorithms, frequently utilizing Convolutional Neural Networks (CNNs) and shape regression techniques [09]. Once these landmarks have been precisely located, the alignment step is where facial changes are used to guarantee a consistent position and orientation. An aligned face image that is prepared for further analysis is produced by this alignment, which corrects for differences in posture, scale, and rotation. The accuracy of landmark localization and alignment improves the reliability and accuracy of subsequent facial analysis tasks, making it a crucial tool in a variety of fields, including human-computer interaction, security, entertainment, and healthcare.
Deep neural network architecture, also known as a DCNN, effectively separates out various characteristics from input data and reduces the size of the original image into a more condensed but still informative representation. Despite having smaller dimensions, DCNN successfully captures multi-layer neural network-based visual qualities, as shown in earlier study [45], reproducing the essential elements of the source image with high reliability. DCNN excels at learning picture features and has a wide range of uses in face recognition. In our work, we used DCNN to extract age and gender characteristics from facial data, and the results were encouraging.
Six layers, including two fully connected layers and five convolutional layers, make up our specialized DCNN architecture, which is created for age and gender detection applications. Both feature extraction and classification are excellently performed by this deep learning network. Based on recognized landmarks, the input image is preprocessed and cropped to a size of 110x110. The input dimension changes to 5x112x112 because of zero-padding matrices. Essential components like dropout layers, ReLU activation functions, batch normalization (BN), and max-pooling are integrated into each of the five convolutional layers. The output of 512 features is produced after the fifth convolutional layer, which is followed by a sequence that includes the first fully connected layer, ReLU activation, BN, dropout, and the next fully connected layer. Applying a dropout layer with a ratio of 0.5. All max-pool layers use a 3x3 filter with a 2x2 stride, however the Conv1 layer uses a 4x4 stride and a 7x7 filter size. The conv2 layer uses a 5x5 filter, whereas the final convolution layer uses a 3x3 filter. The softmax function is then used to normalize these 512 features. Figure 2 shows a condensed illustration of our suggested DCNN design.
A computational optimization method called particle swarm optimization (PSO) was modeled after the social behavior of fish schools and flocks of birds. It is a population-based stochastic optimization technique that iteratively searches a search space for the best answer to a problem. PSO searches the search space for the best solution, known as the global optimum, using a population of potential solutions known as particles. A potential solution is represented by each particle in the swarm, which keeps its position and velocity while looking for the best one. PSO refers to each solution as a "swarm," whilst the potential solutions are referred to as "particles."
Every particle has an initial position and speed that are both random. The fitness value f(x) for the particles is calculated using equation (1). The personal best (pbest), also known as the fitness value's ideal value for the particle as defined by preceding fitness values, is then contrasted with the fitness value. The personal best is used to create the global best (gbest) value. It continues till the stopping requirement.
The pbest, gbest, and old velocities are used to update the velocity in equation (2). pbest is the ideal swarm location, and Pibest is the best-known location. Variables produced by Rand have a similar variance. Equation (3) can be used to modify the particle's position. The particles try to change their positions by adjusting variables such current position (pi), current velocity (vi), distance between current position (pi) and pbest (Pibest ), and distance between current position (pi) and gbest (Pgbest ). Features that are used as training particles are present in the CNN framework that was used to build the face image. Using PSO during the training phase improves the performance of the solution vector and shortens the execution time. The fundamental flaw of PSO is premature convergence, which is mitigated by hybrid PSO. Table 1 shows the abbreviation and their meanings
In order to improve feature selection, hybrid PSO combines PSO and the genetic algorithm (GA) [9–12]. PSO's capacity to explore the global optimal region is constrained by how rapidly it converges to local optima in the search space. PSO and GA are combined to share information among particles, improving exploration, in order to alleviate this restriction. The hybrid PSO uses the best PSO particles in a crossover operation. The success of stochastic techniques like PSO, however, might vary depending on the task, needing a variety of parameter settings. The best PSO particles are used in hybrid PSO to reduce this problem. Through crossover operations, the locations and velocities of these top-performing particles are updated in order to produce better results. These top-performing particles are identified through fitness calculations.
Age and gender classifications follow. Identity describes the characteristics that distinguish one face from another. Age, gender, facial landmarks, and expressions can all be deciding factors. The suggested approach considers identity in addition to age and gender categories. The suggested method employs classification to determine a human's age and gender based on the input image. SVM is used in the classification procedure to categorize age and gender [15]. SVM makes it simpler to classify images and comprehend the attributes present in them. SVM constructs an optimal hyperplane in multidimensional space to divide images into two groups for gender categorization and eight classes for age categorization. The outcomes of the HPSO are visualized in a multidimensional space. The classes can be divided using the maximum marginal hyperplane (MMH).
IV. RESULTS AND DISCUSSION
The proposed system is put into practice using the Python TensorFlow framework. The dataset is split into train and test sets after the input photos have been loaded into OpenCV. Executing image preprocessing on each image in the dataset would be the first step in producing an image of size 111×111 . Facial landmarks are located and extracted using dlib and OpenCV. 68 coordinate points that map to the structures on the face are estimated using the dlib software. The next step deals with keypoint alignment and localization. Deep convolutional neural networks (DCNN) are constructed and implemented in Python using the TensorFlow framework. With each convolutional layer, the filter size increases by half until it reaches 512, starting at 32. Max pool layer is composed of filters with a 2 stride and a 2 by 2 dimension. A dropout rate of 0.45 has been determined. The effectiveness of the proposed method is assessed using real-time photographs and publically available data sets such as Adience.
Images from a smartphone that were routinely published to Flickr make up the Adience dataset [1]. The Adience dataset, which comprises of benchmarks of face photos, is mostly used to identify age and gender.
The collection includes photographs with varying degrees of look, noise, stance, and lighting as well as shots that were not meticulously planned or positioned. The following six age groups are included in this collection's images: "0-5," "6-15," "16-25," "26-45," "46-60," and "60+".. Additionally, there are pictures of both sexes. In Table 2, the total number of photos for each category for both men and women, the distribution of faces by age groups, and the Adience face dataset are all shown. Additionally, actual photographs from the real world are used, such as live webcam images of people and images that can be accessed online.
\
Age and gender are crucial factors in a wide range of applications, which has piqued the scientific community\'s interest in their quest to identify these characteristics from facial photographs. In this regard, the research presents a ground-breaking nutritional advice system that makes use of facial image-based age and gender identification. The method described in this work is an automated recommender system that can quickly and effectively identify gender, age, and facial traits without any physical interaction. The suggested methodology sets new norms in terms of accuracy and processing efficiency by smoothly merging classification methods, Hybrid Particle Swarm Optimization (HPSO), and Deep Convolutional Neural Networks (DCNN), as supported by thorough experimental evaluations. Future work include the creation of an extensive system of group recommendations that is specifically designed for those who frequent public spaces. This upcoming endeavor has the prospect of improving age and gender recognition technologies\' useful applications even further, helping a wide range of sectors and domains.
[1] Zhang Y, Liu L, Li C and Loy C C 2017 Quantifying Facial Age by Posterior of Age Comparisons The British Machine Vision Conference [2] Salihbašic A and Orehovacki T 2019 Development of Android Application for Gender, Age and Face Recognition using OpenCV 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics pp 1635–40 [3] Choi S E, Jo J, Lee S, Choi H, Kim I J and Kim J 2017 Age face simulation using aging functions on global and local features with residual images Expert Syst. Appl. 80 pp 107–25 [4] Tian Q and Chen S 2018 Joint gender classification and age estimation by nearly orthogonalizing their semantic spaces Image Vis. Comput. 69 pp 9–21 [5] Duan M, Li K, Yang C and Li K 2018 A hybrid deep learning CNN–ELM for age and gender classification Neurocomputing 275 pp 448–61 [6] Rafique I, Asad M, Hamid A, Awais M, Naseer S and Yasir T 2019 Age and Gender Prediction using Deep Convolutional Neural Networks International Conference on Innovative Computing (IEEE) [7] Boussaad L and Boucetta A 2020 An effective component-based age-invariant face recognition using Discriminant Correlation Analysis J. King Saud Univ. - Comput. Inf. Sci. [8] Zhang K E, Gao C E, Guo L, Sun M, Yuan X, X. Han T, Zhao Z and Li B 2017 Age Group and Gender Estimation in the Wild With Deep RoR Architecture The Chinese Conference on Computer Vision vol 5 (IEEE Access) pp 22492–503 [9] Zhang H, Geng X, Zhang Y and Cheng F 2019 Recurrent age estimation Pattern Recognit. Lett. [10] M.S. Shakeel and K.-M. Lam 2019 Deep-feature encoding-based discriminative model for age- invariant face recognition Pattern Recognit 93 pp 442–57 [11] Taheri S and Toygar Ö 2019 On the use of DAG-CNN architecture for age estimation with multi stage features fusion Neurocomputing 329 pp 300–10 [12] Liu N A, Zhang F A N and Duan F 2020 Facial Age Estimation Using a Multi-Task Network Combining Classification and Regression IEEE Access vol 8 (IEEE) pp 92441–51 [13] Chen S, Zhang C, Dong M, Le J and Rao M 2017 Using Ranking-CNN for Age Estimation. Conference on Computer Vision and Pattern Recognition (IEEE) pp 742–51 [14] Huang Y, Wang Y, Tai Y, Liu X, Shen P, Li S, Li J and Huang F 2020 CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition Conference on Computer Vision and Pattern Recognition. [15] Unsang Park, Yiying Tong and Anil K. Jain, \"Face Recognition with Temporal Invariance: A 3D Aging Model,” Eighth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1-7, September 2008. [16] Hussein Rady (2011), “Face Recognition using Principle Component Analysis with Different Distance Classifiers”, IJCSNS International Journal of Computer Science and Network Security, Vol. 11, No. 10, Pp. 134–144. [17] S. Sankarakumar, Dr.A. Kumaravel & Dr.S.R. Suresh (2013), “Face Detection through Fuzzy Grammar”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 3, No. 2. [18] Lanitis, C. J. Taylor, and T. F. Cootes, “Toward automatic simulation of aging effects on face images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 442–455, Apr. 2002. [19] X. Geng, Z.-H. Zhou, and K. Smith-Miles, “Automatic age estimation based on facial aging patterns,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, pp. 2234– 2240, Dec. 2007. [20] X. Geng, Z.-H. Zhou, Y. Zhang, G. Li, and H. Dai, “Learning from facial aging patterns for automatic age estimation,” in Proc. 14th Annu. [21] Ranjan R., Patel V. M., Chellappa R. Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019;41(1):121–135. [22] Kumar P. M., Saravanakumar R., Karthick A., Mohanavel V. Artificial neural network-based output power prediction of grid-connected semitransparent photovoltaic system. Environmental Science and Pollution Research . 2022;29(7):10173–10182. [23] Chen D., Hua G., Wen F., Sun J. Supervised transformer network for efficient face detection. European Conference on Computer Vision; 2016; Cham. pp. 122–138. [24] Kogilavani S. V., Prabhu J., Sandhiya R., et al. COVID-19 detection based on lung CT scan using deep learning techniques. Computational and Mathematical Methods in Medicine . 2022;2022:13. [25] Lu C., Tang X. Surpassing human-level face verification performance on LFW with GaussianFace. Twenty-Ninth AAAI Conference on Artificial Intelligence; 2014; Austin Texas, USA. [26] Sun Y., Wang X., Tang X. Deep learning face representation by joint identification-verification. Advances in Neural Information Processing Systems . 2014;27 [27] Sun Y., Wang X., Tang X. Deep learning face representation from predicting 10,000 classes. Proceedings of the IEEE conference on computer vision and pattern recognition; 2014; Austin Texas, USA. pp. 1891–1898. [28] Zhu Z., Luo P., Wang X., Tang X. Recover canonical-view faces in the wild with deep neural networks. 2014. [29] Zhu Z., Luo P., Wang X., Tang X. Deep learning identity-preserving face space. Proceedings of the IEEE international conference on computer vision; 2013; pp. 113–120. [30] Zhu Z., Luo P., Wang X., Tang X. Deep learning multi-view representation for face recognition. 2014. [31] Kaliappan S., Saravanakumar R., Karthick A., et al. Hourly and day ahead power prediction of building integrated semitransparent photovoltaic system. International Journal of Photoenergy . 2021;2021:8.
Copyright © 2023 Shama Shilpi, Shwetank Aryan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET55806
Publish Date : 2023-09-20
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here