Survey Paper: Automatic Title Generation for Text with RNN and Pre-trained Transformer Language Model

Authors: Vishal Lodhwal , Gowri Choudhary

DOI Link: https://doi.org/10.22214/ijraset.2023.49713

Abstract

Nowadays huge amounts of text data are available due to the evolution of the Internet. Although search engines are used to select text data, it is unfeasible to go through the entire search results of text that are related to search intent. Therefore, text summarization is the only method by which we can reduce text data without the loss of information. A new method for Title or Text generation is the Transformer language model that has been trained i.e. GPT-2, GPTNeo, Chat-GPT, and LSTM Model of RNN by which we can generate catchy titles to take readers’ attention and attract them to read a full article. This paper has discovered recent literature on all the previously mentioned topics which are related to Title or Text generators and examines several methods which propose to make use of various language models. The suggested approaches are also contrasted with one another to highlight their unique advantages, and to suggest constantly better ways.

Introduction

I. INTRODUCTION

Text summarization aims at solving “information explosion”, which with the help of various algorithms, may automatically produce a condensed version of the original content while containing its main idea[1].Automatic text summarization(ATS) is a process for automatically extracting important information from text using a particular algorithm or method.There are two main approaches used: abstractive summarization and extractive summarization Abstractive summarization performs poorly for language ATS tasks requiring little in the way of resources because of a lack of corpora. Because of this, academics frequently use extractive summarization in low-resource languages rather than an abstractive summary.Title generation is a significant and difficult issue in NLP (Natural Language Processing).

A specific case of summarization is title generation. The goal of summarization is to condense the given information into a relatively brief paragraph while maintaining the core idea of the underlying subject [2]. While title creation adheres to certain latent traits, such as linguistic construction, it captures the substance of the content in a few words or at most a phrase. For many years, automatic summarization has been the subject of intense research. A minor portion of this research deal with automatic title generation. The problem is still difficult to solve when large training data are not available, as well as in specialised domains like scientific and technical articles that have unique structural patterns and specialised technical vocabulary. Previous title generation systems have produced somewhat good and relevant titles in some contexts. To solve this, we provide a novel method of a supervised generative model for title generation constructed utilising a pre-trained GPT-2 (Generative Pre-trained Transformer 2) language model.

The simple goal of the GPT-2 language model, a sizable transformer-based language model trained on 40GB of internet text, is to predict the next word given all the terms in a sequence. The synthetic text samples produced by GPT-2 are cohesive continuations of the input and have a realistic appearance [2]. Using machine translation principles, we suggested a method for creating titles using a recurrent neural network [3]. The title generator uses Long Short Term Memory, one of the recurrent neural networks, to build its encoder and decoder. Several areas are starting to employ cutting-edge language models, such as Chat-GPT. These models can aid with homework assignments, business plan preparation, coding, and even survey creation.Additionally, it has been discovered that these systems can even produce phoney articles and abstracts. Examining Chat-GPT and human-generated texts for manuscript preparation is crucial to understanding the future of scientific writing in the Chat-GPT era. By comparing Chat-output GPTs with actual published content using supervised and unsupervised text mining methodologies, this study attempted to assess Chat-capacity GPTs to prepare an article for publication. The introductory chapters of 327 papers on road safety that have been published were used in this study.

II. LITERATURE REVIEW

An overview of the literature provides a complete overview of previous research on the topic. Each source is described and researched in a literature review and provided with a brief synopsis. Earlier researchers' contributions are acknowledged, so readers can be confident that their research has been carefully evaluated. To fully understand the progress of this topic, the reader needs the landscape provided by the literature review.In this section, we discuss various authors' work in the field of Text Generation using different relevant techniques.Hayashi et al.,[3] have presented a long short-term memory (LSTM) model, which, as the name suggests, is a type of recurrent neural network. The proposed title generator consists of two modules, an encoder and a decoder. The encoder first creates an intermediate representation using the body of the article. The decoder then uses the intermediate representation to create the title. Mishra et al., [2] describe a mechanism to automatically generate titles for given text using the pre-trained Transformer language model GPT-2. This model creates a pool of potential titles, selects a suitable title, and suggests a unique way to refine or denoise it to get the final title. This strategy involves a pipeline containing his three modules of generation, selection and refinement, followed by a scoring mechanism. The Generation and Refinement modules use the GPT-2 framework, while the Selection module uses a heuristics-based methodology.Gogoulou et al., [4] have performed Comprehensive and exciting research demonstrating GPT-enabled SW3 for producing quality writing. Compared to other autoregressive models of comparable size, GPT-SW3 is a powerful model capable of translating from Swedish to English in zero-shot, one-shot and few-shot settings. It was trained on a recently built 100GB Swedish corpus.Garg et al., [5] have explored features of pre-trained language models BART is an encoder/decoder model, whereas both GPT2 and GPT-Neo are decoder-only models that generate a structured set of MR tags as input. Dascalu et al., [6] have introduced RoGPT2 as the Romanian version of the GPT2 model trained on the largest corpus available in Romanian. The model is trained in three versions.Basic (124M parameters), Medium (354M parameters), and Large (774M parameters).

Table 1: Show the overview of various research in the field of title generation.

S. No.	Topic	Publishing organization	Year of Publication
1	Title Generation with Recurrent Neural Network	IIAI	2016
2	Automatic Title Generation for text with Pre-trained Transformer Language Model	ICSC	2021
3	Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish	ELRA	2022
4	Stylistic MR-to-Text Generation Using Pre-trained Language Models	NLPAI	2021
5	RoGPT2:Romanian GPT2 for Text Generation	NLPAI	2021
6	ChatGPT Scientific Writings: A Case Study on Traffic Safety	Research Gate	2023
7	ChatGPT for (Finance) research: The Bananarama Conjecture	Elsevier	2023
8	Medical Text Prediction and Suggestion Using Generative Pre-trained Transformer Models with Dental Medical Notes	Research Gate	2022
9	Chat2VIS: Generating Data Visualizations via Natural Language using ChatGPT, Codex and GPT-3 Large Language Models	SMCS	2023
10	ChatGPT for Next Generation Science Learning	XRDS	2023
11	Automated Title Generation in English Language Using NLP.	Research Gate	2016
12	Multilingual neural title generation for e-Commerce browse pages	ACL	2018

Kutela et al., [7] explored the use of ChatGPT and human-powered text for manuscript creation. A prompt containing instructions and the title of the 327 manuscripts was provided to ChatGPT to purchase the introductory section generated by ChatGPT. Five supervised text classifiers were used to distinguish between human-generated referrals and ChatGPT-generated referrals.It supports Vector Machines (SVM), Random Forests (RF), Naive Bayes (NB), LogitBoost, and Neural Networks (NNet). Using two of his unsupervised techniques, text network analysis (TNA) and text clustering, we identified differences in text content between human-generated and ChatGPT-generated testimonials. browsing et al., [8] Based on financial journal reviewers' evaluations of previously discovered AI-generated results, Chabot ChatGPT can be of great help to financial research. As a result, they choose cryptocurrencies as their financial topic. This is an important and generally well-defined area of ??modern finance. They also focus on letter-style articles, such as those found in Finance Research Letters, which are 2000-2500 words in length. Importantly, they show that the amount of personal data and the researcher's expertise are important determinants affecting the quality of the output. Finally, they address the implications of this new technology, especially the ethical aspects. Sirrianni et al., [9] explored the GPT model used in healthcare. They tested the effectiveness of his GPT-2 and GPT-Neo models for predicting medical text using 374,787 dental free text notes. Such an implementation could help reduce the time medical professionals spend on medical records and ameliorate physician burnout associated with time spent on medical records. Maddigan et al., [10] proposed to convert free-form natural language into visualizations using state-of-the-art direct free natural language-to-code conversion for large-scale language models (LLMs), such as ChatGPT and GPT-3, and corresponding visual representations. natural end-to-end solution for Chat2VIS, LLM combined with suggested prompts provides a reliable technique for creating visualizations from natural language queries, even if the query is inherently wrong and underspecified. indicates to provide. Zhai et al., [11] have presented using the performance expectation of the Next Generation Science Standards, ChatGPT develops a performance-based assessment problem automatically. They responded and asked ChatGPT to assess it as well as provide learning guidance and materials based on the feedback. Sethi et al., [12] Show how to generate or identify the main idea of a story or document in English without reading the whole thing. Prioritizing frequency, combining nouns and adjectives or idiom-based titles. It takes a story as input and generates a title. To find English articles and article titles, they proposed an algorithm. Database size directly affects system performance. The program is useful for both students and teachers as it provides information and tools for examining sentence structure.Mathur et al., [13] have presented that by training these models with multilingual data, we create a composite model that can serve titles in a variety of different languages.

A. GPT(Generative Pre-trained Transformer)

The first sign language model published in 2018 was a Generative Pre-Trained Transformer (GPT) model created by Open AI. its schematic representation is shown in fig 2. Using human-like completion, GPT was able to create content that could be read as if it were written by humans, answer queries, and assist with tasks such as translation and summarization [14]. Open AI eventually developed his GPT-2 and GPT-3 models based on this original model, each with more advanced capabilities shown in table 2.The introduction of GPT marks an important turning point for the NLP community, opening up a wealth of opportunities for adoption in both academic and commercial settings. Recent developments also include GPT-3 and chat GPT. These have been trained on much larger datasets. Text from a very large web corpus, showing state-of-the-art performance on a wide range of natural language tasks, from translation to question answering to coherent essay writing and computer programming. Also, much research has been invested in optimizing these models for smaller datasets and using transfer learning to solve new problems. This allows certain activities to be performed more effectively while using fewer data. Various parameters of the transformer-based language model are shown in Figure 1.

B. GPT-2

Language model (Generative Pre-trained Transformer 2). GPT-2 is a de facto transform-based language model trained on 40 GB of online content with the simple goal of predicting the next word given all the preceding words in a sequence [2]. Artificial text samples generated by GPT-2 are a coherent continuation of the input and look natural.

Advantages

In comparison to GPT-1, GPT-2 has 10 times as many parameters and data.
Without requiring training data on specific topics, GPT-2 can learn linguistic tasks such as reading, summarizing, and translating from raw text.

2. Disadvantages

GPT-2 was known to perform poorly when given tasks in specialised areas such as music and storytelling.
False Information: Generative pre-trained Transformer-2 has been trained on millions of websites, but because this dataset was used for training, the data distribution is biased and the relevance or Accuracy is a non-negligible issue

C. GPT-3

The GPT-3 language model, the most popular and largest language model available today with 175 billion parameters, was pre-trained on a large text dataset including books, magazines, websites, etc. GPT-3, like the other language models mentioned above, uses a transformer design to allow it to process sequential input quickly and produce more readable and well-contextualized text. The material produced by GPT-3 is almost unrecognizable from human written[14].GPT-3 can handle tasks it has not been specially trained for, thanks to its ability to perform zero-shot learning. This opens up a wide range of application possibilities, from automation (summaries, text entry from key points) to dialogues, systems, chatbots, and creative writing.

Advantages

The ability to generate large amounts of text makes the process of creating text-based content easier and more effective.
Dialog is more powerful than GPT-2 because it can handle more specialized topics. Tasks such as answering questions, writing essays, summarizing texts, translating languages, and writing computer code can now advance the GPT.

2. Disadvantages

The main drawback of GPT-3 is its inability to continuously learn. Already pre-trained, they lack continuous long-term memory to acquire knowledge from every interaction.
The biggest disadvantage of GPT-3 is its cost.

D. ChatGPT

An artificial intelligence language model called ChatGPT was released in November 2022 to provide conversational responses to query prompts [8]. Models with over 150 billion parameters are trained using a combination of reinforcement learning algorithms and human interaction.

Advantages

Build advanced chatbots that can perform tasks or provide rich information.
According to user needs by providing customized recommendations.
Save time and effort by automating routine tasks.
Fast and accurate answers to questions.

Students can receive one-on-one study guidance via ChatGPT as another option to support students. By providing targeted learning and resources, ChatGPT helps students acquire the essential knowledge and skills to meet their expectations of achievement as they engage in self-directed learning. ChatGPT saves time and effort and benefits teachers. Increase the effectiveness of assessment development, provide professional learning support, and assist in assessing and reporting student performance.

2. Disadvantages

ChatGPT cannot replace instructors. In this work, ChatGPT showed its excellence in natural languages processing tasks such as automated assessment development, automated scoring, automated guidance, and automated suggestions for academic learning. However, it is important to understand that ChatGPT cannot replace teachers. ChatGPT can provide targeted and relevant information, but it cannot help students emotionally or encourage critical thinking and problem-solving skills in science learning.

E. GPTNeo

Eleuthera created an open-source GPT-Neo large text generation model containing billions of parameters and modelled after the GPT-3 model 16. The Pile is an 800 GB collection of various texts from numerous sources and serves as a training ground for GPT-Neo 17. Using the GPT-Neo version with 1.3 billion weights. The ‘Neo’ in GPT-Neo stands for ‘New and Improvement’, indicating that this model has been improved to give even better results than the original[9].

Advantages

Ability to learn quickly from large amounts of material.
GPT-Neo is free as compared to other language models.

2. Disadvantages

GPT-Neo was trained on the Pile dataset, which is known to contain vulgar, vulgar, and another offensive language. GPT-Neo may deliver socially objectionable text in some usage scenarios.

Table 2: Shows the comparison of various GPT Models [16].

	Architecture	parameter count	training data
GPT-1	12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax.	0.12 billion	Book Corpus: 4.5 GB of text, from 7000 unpublished books of various genres.
GPT-2	GPT-1, but with modified normalization	1.5 billion	Web Text: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit.
GPT-3	GPT-2, but with modification to allow larger scaling.	175 billion	570 GB plaintext, 0.4 trillion tokens. Mostly Common Crawl, Web Text, English Wikipedia, and two books corpora (Books1 and Books2).

Unlike direct-acting neural networks, where outputs and activations propagate in only one direction, recurrent neural networks propagate activation outputs in both directions. Input to output, output to input [17]. As a result, neural network architectures include loops that act as "memory states" for neurons. An RNN "remembers" what it has learned over time, or maintains its state over time. Memory states come with both advantages and disadvantages. Contains disappearing gradients. While the network is learning with many layers, it is very difficult to learn and change the parameters of previous layers in this challenge. To solve this problem a new type of RNN has been developed; LSTM (long-term memory).

G. Title Generator with LSTM Model

Independent states in LSTM models, cell states, allow the network to effectively learn what to read, what to erase, and what to store in the long-term state[17]. Three layers are present in this model`s LSTM are shown in Fig 3:

The input layer accepts phrases as input.
LSTM layer: This layer computes the output using LSTM units.
The dropout layer is a regularization layer that prevents overfitting.
The output layer determines the probability that the next word will appear.

H. Brief Overview

The title of the work should be crafted carefully. The need to deal with word errors generated by speech recognition makes documenting speech documents very difficult [18]. With so many choices, academic publication titles are doubly important. Instead of reading the entire article, researchers first quickly determine the relevance of the article based only on the title [19]. The number of potential readers is affected by the title of the article, which also affects the number of citations. You can also use AraGPT-2 to translate text into Arabic. It is trained from scratch on a sizeable Arabic corpus of online texts and news articles [20]. AraGPT-2 is the first sophisticated Arabic generative model. We can also use mesh tensorflow with GPT-Neo[21] to transform large-scale autoregressive language modelling. Since titles are created using different language models, we can also automatically assess the quality of machine translation using n-gram co-occurrence statistics [22]. It is also possible to predict the meaning of better language models [23]. A study of the severity of traffic accidents provides important information for developing safety measures. Most of the currently accessible traffic accident datasets contain rich data, including verbal descriptions detailing accident events and circumstances, providing new insights into severity and related causality. You can [24]. We used ChatGPT to create a literature review article, demonstrating the level of OpenAI ChatGPT artificial intelligence applications, and selected applications of digital twins in healthcare [25]. However, automatic algorithms arguably perform worse than humans and often pick up humorous dataset artefacts, making it hard to come up with funny names. Ultimately, even without further fine-tuning [26], ChatGPT rivals the best fine-tuning systems. The accuracy and completeness of using large language models in academic papers such as B. ChatGPT are unknown [27]. These models can generate increasingly realistic text. The recent release of Generative Pre-trained Transformer 3 brings increasingly sophisticated natural language computing applications to medical environments where human interaction has traditionally been the norm. However, opportunities and risks should be carefully weighed before considering his GPT-3 and comparable techniques for use in health-related contexts [28].

Conclusion

In this review article, we developed a creative method to generate titles from short texts using pre-trained transform language models and LSTMs on RNNs. The model was able to generate syntactically and semantically valid titles without using a large training dataset. We study the pros and cons of various pre-trained transformer language models, LSTM models and compare their outcomes and find the best approach. Automatically creating useful summaries from very long documents is a potential future area of our research. This is beneficial if you have the entire pipeline from reading long documents to creating precise summaries and titles.

References

[1] Nankai Lin, Jinxian Li, & Shengyi Jiang, “A simple but effective method for Indonesian automatic text summarization”, Connection Science, 2022– Taylor & Francis, Volume 34, 2022. https://doi.org/10.1080/09540091.2021.1937942 [2] P. Mishra, C. Diwan, S. Srinivasa and G. Srinivasaraghavan, \"Automatic Title Generation for Text with Pre-trained Transformer Language Model\", IEEE 15th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 17-24, doi: 10.1109/ICSC50631.2021.00009,2021. [3] Y. Hayashi and H. Yanagimoto,” Title Generation with Recurrent Neural Network\",5thZf IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto, Japan, pp. 250-255, doi: 10.1109/IIAI-AAI.2016.109, 2016. [4] Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, Fredrik Carlsson, and Magnus Sahlgren, \'\'Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish\'\', In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 3509–3518, Marseille, France. European Language Resources Association,2022 [5] Kunal Pagarey, Kanika Kalra, Abhay Garg, Saumajit Saha, Mayur Patidar, and Shirish Karande, ‘’Stylistic MR-to-Text Generation Using Pre-trained Language Models’’, In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 93–99, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI), 2021. [6] M. A. Niculescu, S. Ruseti and M. Dascalu, “RoGPT2: Romanian GPT2 for Text Generation”, IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), Washington, DC, USA, 2021, pp. 1154-1161, doi: 10.1109/ICTAI52525.2021.00183, 2021. [7] Kutela, Boniphace and Msechu, Kelvin and Das, Subasish and Kidando, Emmanuel, “ChatGPT\'s Scientific Writings: A Case Study on Traffic Safety”,(Available at SSRN: https://ssrn.com/abstract=4329120 or http://dx.doi.org/10.2139/ssrn.4329120,(19 January 2023). [8] Michael Dowling, Brian Lucey, ChatGPT for (Finance) research: “The BananaramaConjecture, FinanceResearchLetters”,103662,ISSN1544-6123,https://doi.org/10.1016/j.frl.2023.103662. (https://www.sciencedirect.com/science/article/pii/S1544612323000363),2023 [9] Sirianni JW, Sezgin E, Claman D, Linwood SL. “Medical text prediction and suggestion using generative pre-trained transformer models with dental medical notes”, Methods Inf Med. doi: 10.1055/a-1900-7351. PMID: 35835447, 2022. [10] Paula Maddigan, Teo Susnjak” Chat2VIS: Generating Data Visualisations via Natural Language using ChatGPT, Codex and GPT-3 Large Language Models”,https://doi.org/10.48550/arXiv.2302.02094, 2023. [11] Zhai, Xiaoming, “ChatGPT for Next Generation Science Learning”, (Available at SSRN: https://ssrn.com/abstract=4331313 or http://dx.doi.org/10.2139/ssrn.4331313 20 January 2023. [12] Sethi, Nandini, et al. \"Automated title generation in English language using NLP\", International Journal of Control Theory and Applications 9.11 (2016): 5159-5168. [13] Prashant Mathur and Nicola Ueffing and Gregor Leusch\"Multi-lingual neural title generation for e-Commerce browses pages\",https://doi.org/10.48550/arXiv.1804.01041,2018. [14] Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Kasneci, G. “ ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education”, https://doi.org/10.35542/osf.io/5er8f 30 January 2023. [15] KelleyDamore,GarrerMann“ExploringGPT3architecture’’,2023https://www.techtarget.com/searchenterpriseai/feature/Exploring-GPT-3-architecture [16] https://en.wikipedia.org/wiki/GPT-2 [17] Thecleverprogrammer’’TitleGeneratorwithMachineLearning”,2020. https://thecleverprogrammer.com/2020/10/05/title-generator-with-machine-learning/#googele_vignette%22-%20%20%20%20learning/#googele_vignette [18] R. Jin and A. G. Hauptmann, “Automatic title generation for spoken broadcast news”, Proceedings of the ?rst international conference on Human language technology research, Association for Computational Linguistics, pp.1–3,2001 [19] J. W. G. Putra and M. L. Khodra, “Automatic title generation in scienti?c articles for authorship assistance: a summarization approach,” Journal of ICT Research and Applications, vol. 11, no. 3, pp. 253–267, 2017. [20] Antoun, W., Baly, F., and Hajj, H.“AraGPT2: Pre-trained transformer for Arabic language”, In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 196–207, Kyiv, Ukraine (Virtual), April. Association for Computational Linguistics.2021 [21] Sid Black, Leo GAO, Phil Wang, Connor Leahy, and Stella Biderman, “GPT-Neo: Large scale autoregressive language modelling with mesh- tensor?ow”,2021 [22] George Doddington. “Automatic evaluation of machine translation quality using n-gram co-occurrence statistics”, In Proceedings of the second international conference on Human Language Technology Research, pages 138–145,2002. [23] A. Radford, J. Wu, D. Amodei, J. Clark, M. Brundage, and I. Sutskever, “Better language models and their implications,” Open AI Blog https://openai. Com/blog/better-language-models, vol. 1, p. 2, 2019. [24] Arteaga C., Paz, A., & Park, J. W., “Injury severity on traffic crashes A text mining with an interpretable machine-learning approach”, Safety Science, pg.no.132, 2020. [25] Ayd?n Ö., & Karaarslan, E. “Open AI ChatGPT Generated Literature Review: Digital Twin in Healthcare”.SSRNElectronicJournal.https://doi.org/10.2139/SSRN.4308687, 2022. [26] Chen Y., Eger S., GPT-3, “Transformers go for the LOLs: Generating (humourous) titles from scientific abstracts end-to-end”.http://dx.doi.org/10.48550/ARXIV.2212.10522, URL arXiv: 2212.10522, 2022. [27] Gao, C.A., Howard, F.M., Markov, N.S., Dyer, E.C., Ramesh, S., Luo, Y., Pearson, A.T., “Comparing scientific abstracts generated by ChatGPT to original abstract using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers”, BioRxiv:10.1101/2022.12.23.521610,2022. [28] Korngiebel DM, Mooney SD.“Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery”. NPJ Digit Med;4(1):1-3, 2021.

Copyright

Copyright © 2023 Vishal Lodhwal , Gowri Choudhary. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET49713

Publish Date : 2023-03-21

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here