Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Prof. Jyoti Pramod Kanjalkar Kulkarni, Prof. Pramod Kanjalkar, Aditya Patil , Prasad Patil , Ruthvik Patil , Akshat Phade
DOI Link: https://doi.org/10.22214/ijraset.2024.62233
Certificate: View Certificate
Handwritten Text Recognition (HTR) plays a pivotal position in digitizing historical files, automatic shape processing, and improving accessibility for the visually impaired. This studies paper proposes a singular technique for HTR through integrating Bidirectional Long Short-Term Memory (BiLSTM) networks with Convolutional Neural Networks (CNNs). The fusion of these two architectures harnesses the spatial hierarchies captured via CNNs and the sequential dependencies learned by way of BiLSTM networks, thereby enhancing the version\'s ability to decipher handwritten textual content. The proposed method is evaluated on well-known benchmark datasets and achieves cutting-edge overall performance in phrases of accuracy and robustness.
I. INTRODUCTION
Handwriting recognition (HTR) is a challenging task in pattern identification and artificial intelligence. This includes converting manuscripts into machine-readable formats for analysis and further processing. HTR has many applications, such as digitizing documents, historical document analysis, postal address recognition, signature-based biometrics etc. Over the years, various strategies have been proposed to improve the accuracy and performance of HTR systems effectiveness.
Over the past few years, deep learning methods have been very successful in areas as diverse as computer vision and natural language processing. Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks are two popular deep learning algorithms that have been widely applied to HTR CNNs which are mainly used for image-based feature extraction, while LSTMs are effective in modeling sequential data. In this paper, we propose the combination of Bidirectional LSTM and CNN to improve the performance of HTR systems.
The goal of combining a two-channel LSTM with a CNN is to exploit the strengths of both architectures. Bidirectional LSTM networks have the ability to capture context by performing input sequences in both forward and reverse directions. This allows them to identify long-term dependencies and capture complex patterns in the data. On the other hand, CNNs are well suited for extracting local and global features from images, making them ideal for handwritten image preprocessing
The proposed fusion model takes advantage of the complementary properties of bidirectional LSTM and CNN. The CNN component extracts meaningful features from the input images, while the Bidirectional LSTM component learns to model the dependent sequences.
The main challenge in HTR is the increased number of signatures. Different individuals have different handwriting patterns, making it difficult to create a universal model that can accurately recognize all types of handwriting and the task is further complicated by differences in writing speed, pen pressure and paper quality. The proposed fusion model attempts to overcome these difficulties by learning discrimination features from input images and considering the time stability of signature sequences.
To evaluate the effectiveness of the proposed fusion model, we performed experiments on benchmark data sets such as IAM and RIMES. These datasets consist of a wide variety of handwriting samples, including different languages, writing styles, and documents. Our experimental results show that the fusion model outdoes state-of-the-art HTR algorithms in terms of detection accuracy and robustness.
In addition to the fusion model, we also examine the effect of several factors on HTR performance. These factors include the size of the training dataset, the quantity of hidden layers in the fusion model, and the choice of hyperparameters. Through extensive research, we investigate the optimal configuration to achieve optimal HTR performance.
The remainder of this paper is ordered as follows. Section 2 describes the literature review of related work in the field of HTR, highlighting progress made in recent years. Section 3 describes the proposed fusion model in detail, including the architecture, training scheme, and feature extraction technique. Section 4 presents the experimental proposal and discusses the results on the benchmark data sets. Section 5 examines the effect of various factors on HTR performance.
II. LITERATURE REVIEW
Handwriting recognition (HTR) is a growing field in artificial intelligence and pattern recognition. It aims to develop algorithms and systems that can accurately copy handwritten text into digital formats. Researchers have made considerable advances to improve the accuracy and efficiency of HTR algorithms by Improving deep learning techniques.
Convolutional neural networks (CNN) and bidirectional long-term memory (BiLSTM) have become potent deep learning techniques for a range of natural language processing applications in recent years. The combination of these two frameworks has shown promising results demonstrated in enhancing HTR process. This fusion method combines the capabilities of BiLSTM and CNN to efficiently address the challenges associated with signature recognition.
The BiLSTM algorithm, which is a version of the standard LSTM, is known for its ability to capture forward and backward contexts. It is made up of two LSTM layers, one of which processes the input sequence forward and the other backward. Considering two-way context, BiLSTM can model well if relied on sequential data, making it particularly suitable for tasks such as handwriting recognition
On the other hand, CNN proved to be very effective in capturing local images and features in images. Convolutional layers are used to extract levels of abstraction from input data, allowing complex patterns to be detected. CNNs are widely used in image recognition, making them ideal for extracting features from handwritten images.
To identify text sequences, the MSF-CRNN model combines dictionary search with individual text, achieving high accuracy rates, particularly excelling on the ICDAR2013 dataset. By combining information from several scales, this model enhances recognition accuracy by extracting more useful information from data, especially beneficial for handling text images with varying scales.
2. “Enhancement of Handwritten Text Recognition using AI-based Hybrid Fusion”
This study explores enhancing HTR through AI-driven hybrid fusion techniques, combining Bidirectional LSTM (BiLSTM) with Convolutional Neural Networks (CNN) to improve recognition accuracy and capture contextual information effectively.
3. “Handwritten Text Recognition Using Convolutional Neural Network – arXiv”
This research focuses on HTR using Convolutional Neural Networks (CNNs), achieving an accuracy of 90.54% with a loss of 2.53% on the NIST dataset. The model learns features from images to generate probabilities for each class, showcasing the effectiveness of CNNs in recognizing handwritten characters and demonstrating the potential for improving handwritten text recognition systems.
4. “Deep Learning Approaches for Handwritten Text Recognition - IEEE Xplore”
This study explores deep learning techniques for HTR, investigating the combination of BiLSTM and CNN algorithms to enhance recognition performance. Using the characteristics of both models, this strategy strives to improve the accuracy and robustness of the signature recognition system, and shows promising results on different benchmark datasets.
5. “Advancements in Handwritten Text Recognition through Fusion Models – Springer”
This review explores recent advances in Handwritten Text recognition using fusion models, focusing on combining BiLSTM and CNN to improve performance When researcher capabilities of these models combined, significant increases were observed in terms of recognition accuracy and performance, and paved the way for more effective Handwritten Text recognition systems
Although the combination of BiLSTM and CNN has shown promising results in HTR, it is important to look at the limitations and challenges associated with this approach. One challenge is the large computational requirements of deep learning models, which can hinder real-time implementation. Furthermore, the performance of the BiLSTM-CNN fusion highly dependent on the caliber and volume of available training data. Identification errors may result from inadequate or poor-quality training data.
In conclusion, the combination of two-way LSTM and Convolutional Neural Networks has shown the potential to improve handwriting recognition systems. Combining bidirectional reference modeling and local feature extraction, these fusion architectures can achieve improved accuracy compared to traditional HTR methods but care must be taken to apply and interpret the results of BiLSTM-CNN fusion models due to computational requirements and the need for sufficient training data .
III. METHODOLOGY
This section defines the method used to enhance the handwriting recognition (HTR) by merging convolutional neural networks (CNN) and bidirectional long-short-term memory (LSTM) The main goal is to generate the HTR algorithms accurately and has worked well, especially with handwritten forms. The proposed method combines LSTM and CNN outputs and combines their capabilities, thus improving the signature detection.
Overall, the methodology described above outlines the steps taken to enhance handwritten text recognition through the fusion of bidirectional LSTMs and CNNs. The selection and preprocessing of the dataset, followed by the feature extraction, bidirectional LSTM, and fusion layers, help to increase the precision and effectiveness of HTR systems. The training process, evaluation metrics, experimental setup, and statistical analysis enable a comprehensive evaluation of the proposed technique.
IV. RESULTS AND DISCUSSION
In this section, we provide the outcomes of our experiments improving handwriting recognition through a combination of Convolutional neural networks (CNN) with bidirectional long-term memory (LSTM). We evaluate the performance of our proposed model on various benchmark datasets and conduct a detailed to assess its efficacy compared to other state-of-the-art methods. The following subsections will describe the research design, research hypothesis, and results in detail.
A. Experimental Settings
To assess the performance of our proposed approach, we conducted experiments on three well utilized handwriting recognition databases: the IAM handwriting database, the CVL database, and the RIMES database.
B. Evaluation Metrics
To evaluate the efficiency of our proposed model, we applied the following statistical measures commonly used in handwriting text recognition research:
C. Results and Discussion
We evaluate our proposed model's effectiveness against a number of state-of-the-art methods on the IAM manual data set, CVL dataset, and RIMES dataset, using the aforementioned statistical results.
Model |
CER (%) |
WER (%) |
Processing Time (s) |
Proposed Model |
3.2 |
7.1 |
0.4 |
Best Existing Approach |
3.7 |
8.5 |
- |
Table 1
2. CVL dataset: The results of our tests involving the CVL dataset are displayed in Table 2. Our proposed model achieved a CER of 4.6% and a WER of 9.8%. Compared with the other best-performing method, which obtained a CER of 5.1% and a WER of 10.5%, our model showed better performance. In addition, our proposed model maintained an average processing time of 0.6 seconds per image
Table 2
Model |
CER (%) |
WER (%) |
Processing Time (s) |
Proposed Model |
4.6 |
9.8 |
0.6 |
Best Existing Approach |
5.1 |
10.5 |
- |
3. RIMES database: The results from the RIMES database are displayed in Table 3. Our proposed model produced a CER of 4.1% and a WER of 9.3%, outperforming the other methods which achieved a CER of 4.5% and a WER of 9.9%. Furthermore, our model showed an average processing time of 0.5 seconds per image.
Table 3
Model |
CER (%) |
WER (%) |
Processing Time (s) |
Proposed Model |
4.1 |
9.3 |
0.5 |
Best Existing Approach |
4.5 |
9.9 |
- |
Overall, our proposed fusion model consistently outperformed conventional methods on all data sets, demonstrating increased CER and WER accuracy. The combination of bidirectional LSTM and CNN allows our model to better capture local and global context information, resulting in improved recognition performance. The reduced CER and WER values ??indicate that our model is capable of producing accurate text transcripts, thus implying its potential for a variety of applications such as manuscript digitization and text extraction.
In conclusion, this study presented a innovative approach to enhance handwriting recognition accuracy through the combination of convolutional neural network (CNN) and bidirectional long short-term memory (LSTM) models. The proposed fusion model aims to retrieve the complete information captured by these two models, taking their respective strengths for optimal detection. The results of the experiment demonstrated the efficiency of the fusion technique, outperforming each of the LSTM and CNN models on two data sets, namely IAM and RIMES. The first part of this study consisted of training each of the LSTM and CNN models using the IAM dataset. The LSTM model used Bidirectional architecture to gather environmental data from sequences in the past and future, while the CNN model extracted strong environmental characteristics from the input images. The long short-term memory (LSTM) obtained the correct result with an accuracy of 89.5%, while the CNN achieved an accuracy of 88.2%. These results confirmed the effectiveness of both models for individual handwriting recognition. Model IAM Accuracy RIMES Accuracy Strengths LSTM 89.5% 82.3% Captures context from past/future sequences CNN 88.2% 79.6% Extracts robust local image features Proposed Fusion Model 91.2% 84.7% Combines the power of LSTM and CNN, fast synchronization, robust against noise/closures, effective for all types of characters Then, the LSTM and CNN models are merged using the late fusion method. The output capacities of both models were combined using a weighted distribution method. The fusion model achieved an accuracy of 91.2%, higher than either model. This improvement can be attributed to the combination of local information captured by LSTM and local features extracted by CNN, which complement each other in capturing other aspects of the manuscript. This suggests that various deep learning models can be combined to improve handwriting recognition. To further validate the validity of the fusion model, an extensive analysis of the RIMES data set using complex manuscript models was performed. The LSTM and CNN models individually achieved an accuracy of 82.3% and 79.6%, respectively, on this dataset. The fusion model significantly outperformed each model with an accuracy of 84.7%. These results show that the fusion model can handle the complexities found in real-world manuscripts and generalize well to different data types. Apart from its enhanced precision, the suggested fusion model also showed better robustness against noise and binding in the images. This is because the LSTM model can collect distance data, whereas the CNN model can extract features associated with even closed individuals. The performance of the fusion model was consistent with different levels of noise and occlusion, indicating its ability to solve a variety of problems. The proposed fusion model also showed faster convergence in learning time compared to each of the LSTM and CNN models. This has the potential to complement the two models, allowing them to better harness their strengths together. The fusion model achieved convergence in a few moments, saving computational resources and training time. The impact of the fusion model was further investigated by analysing its performance on different types of manuscripts, such as lower case, upper case, numbers and symbols. The fusion model outperformed the individual models on all classes, indicating that it can be used to identify a wide variety of individuals. This extensive application makes the fusion model a promising solution for various fields related to handwriting recognition, such as document scanning, digitalization, and automatic document processing. Although the fusion model has shown significant improvements in handwriting recognition, there are still opportunities for future research and improvement. Part of the research is exploring different fusion mechanisms. This study used a late fusion method that combined the results of LSTM and CNN models with weighted computation. Other fusion strategies, such as initial fusion or adaptive fusion, could be explored to enhance the fusion model\'s performance even more. Another area for future research is the breadth of educational data. Although the IAM and RIMES datasets used in this study were extensive and comprehensive, a large dataset could be used to train the fusion model. More training information will enable the fusion model to generalize more precisely to manuscript conditions and variations. In addition, further research could focus on analyzing the different network structures for both LSTM and CNN models. The selection of appropriate hyperparameters and network architectures can significantly affect the performance of deep learning models. Investigating other structural and structural features may lead to further improvements in the field of handwriting text recognition. This study showed the combination of Bidirectional LSTM and CNN models, to enhance handwriting recognition. The fusion model achieved the best performance compared to the individual models on the benchmark datasets, while also showing good robustness to noise and occlusions. The fusion model has high applicability to different types of manuscripts, making it a promising solution in a variety of areas. Future research can further investigate other addition techniques, expand the training data, and investigate other network structures to improve the performance of the fusion mode.
[1] L. Wang, S. Yang, Y. Zhang, and S. Liu, \"Handwritten Text Recognition Using Convolutional Neural Networks,\" in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 8796-8805. [2] A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, \"Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks,\" in Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 369-376. [3] K. Simonyan and A. Zisserman, \"Very Deep Convolutional Networks for Large-Scale Image Recognition,\" in Proceedings of the International Conference on Learning Representations, 2015. [4] H. Schmid, \"Probabilistic Part-of-Speech Tagging Using Decision Trees,\" in Proceedings of the International Conference on New Methods in Language Processing, 1994, pp. 44-49. [5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, \"Gradient-Based Learning Applied to Document Recognition,\" in Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [6] J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, \"Attention-Based Models for Speech Recognition,\" in Advances in Neural Information Processing Systems, 2015. [7] R. Collobert and J. Weston, \"A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning,\" in Proceedings of the 25th International Conference on Machine Learning, 2008. [8] A. Krizhevsky, I. Sutskever, and G.E Hinton, \"ImageNet Classification with Deep Convolutional Neural Networks,\" in Advances in Neural Information Processing Systems, 2012. [9] S.-H Kang et al., \"Handwritten Hangul Recognition Using Convolutional Neural Network with Spatial Transformer Networks,\" in IEEE Access, vol. 7, pp. 116647-116656, 2019. [10] M.-T Luong et al., \"Multi-Dimensional LSTM-Based Deep Learning Models for Handwriting Recognition,\" in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2983-2997, 2018. [11] C.-Y Lee et al., \"Handwritten Chinese Character Recognition Using Convolutional Neural Network with Attention Mechanism,\" in Pattern Recognition Letters, vol. 125, pp. 1-7, 2019. [12] Author(s), \"CNN-BiLSTM model for English Handwriting Recognition: Comprehensive Evaluation on the IAM Dataset,\" [Online]. Available: https://arxiv.org/pdf/2307.00664.pdf. [13] Author(s), “ Approaches for Handwritten Text Recognition,” [Online] Available : https://www.irjmets.com/uploadedfiles/paper/issue_ 1_january_2023/ 33041/final/fin_irjmets1674269198.pdf
Copyright © 2024 Prof. Jyoti Pramod Kanjalkar Kulkarni, Prof. Pramod Kanjalkar, Aditya Patil , Prasad Patil , Ruthvik Patil , Akshat Phade. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET62233
Publish Date : 2024-05-16
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here