In most cases, sketch images simply show basic profile details and do not include facial detail. As a result, precisely generating facial features is difficult. Using the created adversarial network and attributes, we propose an image translation network. A feature extracting network and a down-sampling up-sampling network make up the generator network. There is a generator and a discriminator in GANs. The Generator creates fake data samples (images, audio, etc.) in intended to mislead the Discriminator. On the other hand, the Discriminator attempts to distinguish between the real and fake sample
Keywords: Deep Learning, Generative Adversarial Networks, Image Translation, face generation, skip-connection.
Introduction
GANs (Generative Adversarial Networks) are a form of unsupervised learning neural network. It was created and deployed in 2014 by Ian J. Goodfellow. GANs are essentially two competing neural network models vying for the capacity to analyse, capture, and replicate changes in a dataset. A difficult problem in computer vision is to build a comparable image from a basic word description or sketch, which has a wide range of applications, including criminal investigation and game character development. Isola et al. [1] presented a pixel-to-pixel image translation network for the sketch to image challenge, which prompted a burst of image-to-image research. There is an interesting demo using edge to generate shoes (cat) . the picture creation issue was given by Lu et al. [2] as image completeness problem, with sketch as a weak contextual constraint.
We can train network to create spectacular photographic results from a huge number of sketchface image matches, much as we can evaluate user preferences from large amounts of data [3],[4]. This technique learns to produce new data with much the same statistics as the training dataset given a training set. A GAN trained on images, for example, can develop new images that tend to human observers to be at least on some level authentic, with many realistic traits. GANs have proven helpful for semi-supervised learning, fully-supervised learning, and reinforcement learning, while still being originally proposed as a kind of prediction model for unsupervised learning. As illustrated in Fig. 1, our network is a standard generated adversarial network [5], with two sub-networks of generator and discriminator. Feature extraction module Gf and face reconstruction module Gr are the two types of generators. Branch A and B compensate the feature extraction module Gf. The justification for this structure is that there is relatively little overlap between branches. The property offers high frequency components such as detail and color, whereas the sketch image provides low frequency components such as textures. The overlap between branch A and B is quite minimal when seeing the convolution procedure depicted in Fig. 2.
GAN has recently proved that they can generate high-quality images. The current tendency is to increase the number of layers in order to achieve better performance. The most distressing feature is the fast rise in computing complexity as the number of layers increases, as well as the appearance of different issues. Without impacting network speed, skip-connection can substantially reduce the number of network layers. Because the sketch to face task is viewed as face hallucination super resolution reconstruction in our network, skip-connection [6] is required to maintain our network form to become as deep as those approaches such as ResidualNet [7] and DenseNet [8]
Fig. 1 This is structure of complete network
Fig. 2 (a) is a sketch of image (b) is a extracted feature map
PROPOSED METHOD
We propose a new network based on GAN which is similar to face hallucination reconstruction. We extract profile information and high-level semantic information from sketch images and attribute vectors as during feature extraction step. All of the generator use a combination of GAN and Skip connection layers. Each convolution layer output is passed to the back convolution layer and concatenated to the next concatenation layer at the same time.
Overall Framwork
We developed a standard GAN, which includes of a discriminator network and a generator network. The generator in our network receives sketch image and an 18-dimensional attribute vector as input. The discriminator input is the same 18-dimensional attribute vector as the generator network and a photographic face picture matching to the sketch image. The distinction is that after the fully connected layer, the attribute vector in the generator is reshaped to the same size as the sketch picture, and the discriminator network is reshaped before being concatenated to the third convolution layer output.
Network Architecture
Generator Model:Is and A are the two inputs to our network. Branch A's input, Is, is a sketch picture, and three convolution layers are used to extract low-level characteristics like the face profile. The output of the three convolution layers is concatenated using skip-connection in this case. A is an 18-dimensional attribute vector that is employed in the corresponding image to convey some facial characteristics. Using the fully connected layer, we increase the capacity of the layer from 18 to 41592 (128 *128) and then reshape it to match the sketch picture. Branch B's convolution technique is identical to branch A's. figure 3 is the example of generator model
Discriminator Model:We intend using the sketch image's attribute information to produce the corresponding photographic face, similar to Reed et al. [9]. To distinguish the real picture from the corresponding attribute pair, we need the discriminator network. To train the discriminator network, we use positive sample pairs {f,a} to represent ground-truth face images f and their accompanying ground-truth characteristics a. Generator face f and their ground-truth characteristics a, as well as actual faces and mismatched attributes a^, create negative data. As a result, the negative sample pairings include both {f^, a} and {f, a^}. Figure4 shown the example of discriminator model
Conclusion
There are already several great algorithms and networks for image to text and image to image conversion, and they can all generate clear images. However, due to the peculiarity of the face, which needs more features represent and colour, it is difficult to generate a good face in the field of face generation. We propose a more appropriate network based on the modification of tasks for the sketch to face issue as face hallucination super-resolution reconstruction. Several experiments on the subject of sketch identification have been carried out in recent years. GAN has already been successfully used to predict colourful pictures from black and white input sketches. We employ sophisticated semantic data and include attribute values into feature extraction. Meanwhile, our network uses skip-connection techniques. All of these variables contribute to a more realistic and photojournalistic face image. And the approach proposed has a significant impact, particularly when it comes to developing local features.
References
[1] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, ‘‘Image-to-image translation with conditional adversarial networks,’’ in Proc. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1125–1134. [Online]. Available: https://arxiv.org/abs/1611.07004
[2] Y. Lu et al., ‘‘Image generation from sketch constraint using contextual GAN,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), Oct. 2018, pp. 639–648. [Online]. Available: https://link.springer.com/book/10.1007/ 978-3-030-01270-0
[3] Z. Cheng, Y. Ding, L. Zhu, and M. Kankanhalli, ‘‘Aspect-aware latent factor model: Rating prediction with ratings and reviews,’’ in Proc. Int. World Wide Web Conf. Steering Committee (WWW), Apr. 2018, pp. 639–648.
[4] Z. Cheng, J. Shen, L. Nie, T. S. Chua, and M. Kankanhalli, ‘‘Exploring user-specific information in music retrieval,’’ in Proc. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr. (SIGIR), Aug. 2017, pp. 655–664.
[5] I. Goodfellow et al., ‘‘Generative adversarial nets,’’ in Proc. Adv. Neural Inf. Process. Syst., Dec. 2014, pp. 2672–2680.
[6] O. Ronneberger, P. Fischer, and T. Brox, ‘‘U-net: Convolutional networks for biomedical image segmentation,’’ in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), Oct. 2015, pp. 234–241.
[7] J. Kim, J. K. Lee, and K. M. Lee, ‘‘Accurate image super-resolution using very deep convolutional networks,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1646–1654. [Online]. Available: https://arxiv.org/abs/1511.04587
[8] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, ‘‘Densely connected convolutional networks,’’ in Proc. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708. [Online]. Available: https://arxiv.org/abs/1608.06993
[9] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, ‘‘Generative adversarial text to image synthesis,’’ in Proc. Int. Conf. Mach. Learn. (ICML), May 2016, pp. 1681– 1690. [Online]. Available: https://arxiv. org/abs/1605.05396