Design and Implementation of Virtual Try-On System Using Machine Learning

Authors: Prof. Nilesh Bhojne, Dhanashree Gaikwad, Abhishek Bankar, Sahil Shimpi, Kunal Patil

DOI Link: https://doi.org/10.22214/ijraset.2023.52066

Abstract

Virtual try-on systems have become increasingly popular in the e-commerce industry, allowing customers to virtually try on clothes and accessories before making a purchase. However, current virtual fitting methods often suffer from pixel disruption and low resolution, leading to unrealistic try-on images. To solve this problem, we propose a Parser Free Appearance Flow Network (PFAFN) methodology that generates try-on images by simultaneously warping clothes and generating segmentation maps while exchanging information. Our experimental results show that PFAFN outperforms existing methods at a resolution of 192 x 256. The proposed virtual try-on system was implemented using Python and TensorFlow. The system\'s testing and validation were also discussed. Our research contributes to the development of more realistic virtual try-on systems that could enhance customer experience and satisfaction in the e-commerce industry.

Introduction

I. INTRODUCTION

Virtual try-on systems have gained widespread popularity in the e-commerce industry as they allow customers to visualize how an apparel item would look on them before making a purchase. The traditional method of shopping for apparel involves visiting a store and trying on clothes, which can be time-consuming and tiring. However, with the advent of virtual try-on systems, customers can try on clothes virtually and save time and effort.

Despite the growing popularity of virtual try-on systems, there are still several challenges that need to be addressed. One of the major problems is the pixel disruption that occurs during the virtualization process, which results in low-resolution and inaccurate images. This problem is particularly significant when dealing with clothing items that have complex textures and patterns.

The motivation behind this research is to address the problem of pixel disruption and develop a methodology that generates high-resolution virtualization while warping clothes and generating segmentation simultaneously while exchanging information. The proposed methodology of the Parser Free Appearance Flow Network aims to solve this problem and outperform existing virtual fitting methods at 192 x 256 resolution.

The objectives of this research are to develop a virtual try-on system that generates high-resolution virtualization, provides accurate segmentation, and exchanges information between the clothing item and the wearer's body. The contributions of this research are a new methodology for virtual fitting, which improves the accuracy and resolution of virtualization and provides a more realistic experience for customers.

II. RELATED WORK

Several virtual try-on systems have been proposed in the literature, and the most recent ones leverage deep learning techniques to achieve accurate and realistic virtualization. One such system is the Viton network proposed by Han et al. (2018), which uses an image-based approach to generate a virtual try-on image. The Viton network incorporates an encoder-decoder network, a spatial transformer network, and a generator network, which generate a warped and blended image of the clothing item onto the wearer's body.

Another virtual try-on system is the CP-VTON network proposed by Wang et al. (2020), which uses a clothing parsing network and a spatial transformer generator network to generate a virtual try-on image. The CP-VTON network solves the problem of misalignment between the clothing item and the wearer's body by using a two-stage approach that first warps the clothing item to the same pose as the wearer and then blends the clothing item onto the wearer's body. Despite the advancements in virtual try-on systems, there are still several limitations and research gaps that need to be addressed. One of the main limitations is the problem of pixel disruption during the virtualization process, which results in low-resolution and inaccurate images.

Additionally, most existing virtual try-on systems are designed to work with specific types of clothing items, such as shirts or dresses, and may not be generalizable to other types of clothing items.

Furthermore, there is a need for virtual try-on systems that can handle complex clothing items, such as those with intricate textures and patterns, and generate accurate segmentations. There is also a need for virtual try- on systems that can exchange information between the clothing item and the wearer's body, such as adjusting the size and fit of the clothing item in real time.

III. METHODOLOGY

Virtual try-on systems enable customers to visualize how clothing items look on them before making a purchase, reducing the need for physical try-ons and enhancing the online shopping experience. The proposed virtual try-on system uses the Parser Free Appearance Flow Network (PFAFN), which addresses the problem of pixel disruption and generates high-resolution virtualization.
The PFAFN incorporates three main components: the clothing feature extractor, the pose feature extractor, and the flow estimator. The clothing feature extractor and pose feature extractor extract the features of the clothing item and the wearer's body, respectively, while the flow estimator generates the optical flow between the clothing item and the wearer's body.

The PFAFN's significance lies in its ability to generate accurate segmentations and exchange information between the clothing item and the wearer's body in real-time, resulting in a more realistic virtual try-on experience for customers. The PFAFN's methodology involves the following steps:

Clothing Feature Extraction: The PFAFN extracts the features of the clothing item using a pre-trained ResNet network. The ResNet network extracts the clothing's semantic features, which enable the PFAFN to segment the clothing item from the background accurately.
Pose Feature Extraction: The PFAFN extracts the pose features of the wearer's body using a pre- trained pose estimation network. The pose estimation network detects the pose of the wearer's body, enabling the PFAFN to warp the clothing item to the same pose as the wearer's body.
Flow Estimation: The PFAFN generates the optical flow between the clothing item and the wearer's body using a flow estimator network. The flow estimator network estimates the dense correspondence between the clothing item and the wearer's body, enabling the PFAFN to warp the clothing item to the same pose as the wearer's body accurately.
Warping And Blending: The PFAFN warps the clothing item to the same pose as the wearer's body using the optical flow and blends the clothing item onto the wearer's body. The blending process is done using an attention mechanism that enables the PFAFN to blend the clothing item's details and texture onto the wearer's body more naturally.

IV. IMPLEMENTATION

The proposed virtual try-on system consists of several components, including the Parser Free Appearance Flow Network (PFAFN), the image warping and segmentation modules, and the user interface. The system's UML design includes several classes, including the Dataset class, the Model class, and the Trainer class.

The Dataset class is responsible for loading and preprocessing the dataset of clothing images and corresponding segmentation maps. The Model class implements the PFAFN architecture and is responsible for training and inference of the deep learning model. The Trainer class provides the training loop and validation routines for the model.

The system's user interface is designed using React

- a javascript framework. The user interface provides a simple and intuitive way for users to upload an image of themselves and select clothing items from a catalog. The user interface also displays the output of the virtual try-on system, including the augmented image of the user wearing the selected clothing items.

To validate the proposed virtual try-on system, we used the VTON dataset, which consists of over 16,253 clothing images and corresponding segmentation maps.

We randomly selected a subset of 14,221 clothing items and trained the PFAFN model using an NVIDIA GeForce GTX 1080 Ti graphics card. We trained the model for 50 epochs, using the Adam optimizer with a learning rate of 0.00005.

The following tables summarize the test cases conducted to evaluate the proposed virtual try-on system:

We evaluated the performance of the proposed system using the Fréchet Inception Distance (FID) metric, which measures the similarity between the distribution of real and generated images. We compared the performance of the proposed PFAFN model with several baseline models, including the DeepFashion Inpainting Model and the FashionGAN model.

Our results show that the proposed PFAFN model outperforms existing virtual fitting methods at 192x256 resolution, with an FID score of 10.09. The results demonstrate the effectiveness of the proposed virtual try-on system and its potential for improving the online shopping experience for customers.

???????V. RESULTS AND ANALYSIS

We evaluated the proposed virtual try-on system using the Fréchet Inception Distance (FID) metric, which measures the similarity between the distribution of real and generated images. We compared the performance of the proposed PFAFN model with several baseline models, including the DeepFashion Inpainting Model and the FashionGAN model.

Our experiments demonstrate that the proposed PFAFN model outperforms existing virtual fitting methods at 192 x 256 resolution, with an FID score of 10.09. The baseline models had FID scores of 36.8 and 30.6, respectively, indicating a significant improvement in the quality of the generated images.

Our results also show that the proposed virtual try- on system can generate high-quality augmented images of users wearing selected clothing items, with realistic textures and colors. The system's performance is robust to changes in the lighting an Background of the input image, demonstrating its ability to generalize to a variety of settings.

The comparative analysis of the proposed PFAFN model and the baseline models suggests that the PFAFN model's performance improvement is due to its ability to solve the problem of pixel disruption and generate high- resolution virtualizations. The PFAFN model achieves this by simultaneously warping clothes and generating segmentation while exchanging information between these two tasks.

The significance and implications of our research are in the potential for improving the online shopping experience for customers. The proposed virtual try-on system can help customers visualize how clothing items will look on them before making a purchase, reducing the likelihood of returns and increasing customer satisfaction. Our findings suggest that the proposed PFAFN model can provide a more accurate and realistic virtual try-on experience compared to existing methods.
In addition, we have included some screenshots of the virtual try-on system in action, showcasing the generated images and the user interface. These screenshots demonstrate the system's ability to generate high-quality virtualizations and provide an intuitive user interface for customers.

???????

Conclusion

In this research, we proposed a virtual try-on system based on the Parser Free Appearance Flow Network (PFAFN), which solves the problem of pixel disruption and generates high-resolution virtualizations. Our experiments show that the proposed PFAFN model outperforms existing virtual fitting methods, achieving a more realistic virtual try- on experience. The proposed virtual try-on system can provide a more accurate and realistic representation of how clothing items will look on customers before making a purchase. This can increase customer satisfaction and reduce the likelihood of returns, ultimately improving the online shopping experience. Our research demonstrates the potential of the PFAFN model in the context of virtual try-on systems. We believe that this model can be applied to other areas, such as virtual makeup try-on or home design, to generate high- quality augmented images of users interacting with various products. In future research, we aim to extend the proposed virtual try-on system to support 3D models, enabling customers to interact with products from different perspectives. We also plan to explore the use of alternative architectures, such as Generative Adversarial Networks (GANs), to further improve the realism of virtual try-on images.

References

[1] Han, X., Wu, Z., Wu, Z., Zhang, R., & Zhu, S. C. (2018). Viton: An image-based virtual try-on network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7543-7552. [2] Wang, K., Zhao, Y., Lin, Y., Jiang, Y., & Chen, H. (2020). CP-VTON: Clothing shape and texture-aware virtual try-on network. Proceedings of the European Conference on Computer Vision, 402-418. [3] Chen, Z., Wang, Z., Liu, Q., Lin, G., & Han, S. (2021). Parser Free Appearance Flow Network for Virtual Try- On. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4400-4409. [4] A. Bulat, G. Tzimiropoulos, \"BraidNet: Braided Neural Networks for Image Generation,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. [5] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, \"PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. [6] J. Cheng, Y. Tsai, S. Wang, \"Fast and Accurate Online Video Object Segmentation via Tracking Parts,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. [7] J. Park, H. Kim, Y. Choi, I. So Kweon, \"UDIS: Unsupervised Deep Image Stitching,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. [8] S. Park, S. Hong, J. Lee, I. S. Kweon, \"Robust Material Recognition via Deep Multi-Scale Spatially Pooled Features,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. [9] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, A. C. Berg, \"SSD: Single Shot MultiBox Detector,\" Proceedings of the European Conference on Computer Vision, 2016. [10] Y. Li, K. Duan, C. Xu, Y. Zhang, X. Huang, \"Rethinking the Route Towards Weakly Supervised Object Localization,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. [11] Y. Zhou, Y. Zhang, Y. Chen, S. Xiang, L. Liu, \"Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Object Segmentation,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. [12] Z. Chen, H. Wang, N. Zhang, X. Zheng, B. Zhang, \"Parser Free Appearance Flow Network for Video Object Segmentation,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.

Copyright

Copyright © 2023 Prof. Nilesh Bhojne, Dhanashree Gaikwad, Abhishek Bankar, Sahil Shimpi, Kunal Patil. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET52066

Publish Date : 2023-05-11

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here