Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Sakshi Jawale, Pranit Londhe, Prajwali Kadam, Sarika Jadhav, Rushikesh Kolekar
DOI Link: https://doi.org/10.22214/ijraset.2023.51815
Certificate: View Certificate
Text Summarization is a Natural Language Processing (NLP) method that extracts and collects data from the source and summarizes it. Text summarization has become a requirement for many applications since manually summarizing vast amounts of information is difficult, especially with the expanding magnitude of data. Financial research, search engine optimization, media monitoring, question-answering bots, and document analysis all benefit from text summarization. This paper extensively addresses several summarizing strategies depending on intent, volume of data, and outcome. Our aim is to evaluate and convey an abstract viewpoint of the present scenario research work for text summarization.
I. INTRODUCTION
To summarize a piece of writing is to present the main points in a concise form. Work on automated text summarization began over 40 years ago [1]. The growth of the Internet invigorated this work in recent years [2], and summarization systems are be-ginning to be applied in areas such as healthcare and digital libraries [3]. Several commercially available text summarizers are now on the market. Examples include Capito from Semiotis, Inxight’s summarizer, the Brevity summarizer from LexTek International, the Copernic summarizer, Text Analyst from Mega puter, and Whis-key™ from Conver speech. These programs work by automatically extracting selected sentences from a piece of writing.
II. LITERATURE REVIEW OF EXISTING SURVEY
We have investigated the existing surveys of the ATS domain, and a few of them are presented to prove the significance of this paper. Most surveys covered the former methods and research on ATS. However, recent trends, applicability, effects, limitations, and challenges of ATS techniques were not present. Table 1 summarizes and compares the existing survey on ATS. Mishra et al. [5] reviewed (2000-2013) years of studies and found some methods such as hybrid statistical and ML approaches. The researchers did not include cognitive aspects or evaluations of the impact of ATS. Allahyari et al. [8] investigated different processes such as topic representation, frequency-driven, graph-based, and machine learning methods for ATS. This research only includes the frequently used strategies. El-Kassas et al. [10] described graph-based, fuzzy logic-based, concept-oriented, ML approaches, etc., with their advantages or disadvantages.
This research did not include abstractive or hybrid techniques provided a taxonomy of text summarization methods and a variety of techniques. Although the author has covered some time consuming processes of ATS, recent, more efficient methods such as machine learning were missed. Abualigah et al. [18] conducted research on how to handle multiple documents and massive web data for text summarization
III. AUTOMATIC TEXT SUMMARIZATION APPROACHES
A. Extractive Text Summarization
a. Supervised Learning Methods: In supervised learning methods, the first step is to learn how to label documents by training to identify summarized and non-summarized documents.
b. Unsupervised Learning Methods: With unsupervised learning methods, the summarization process can be performed without any help, such as selecting the introductory sentences of the document from the user. These methods only require advanced algorithms such as graph-based, concept-based, fuzzy logic, and latent semantics to take user input and work automatically . These approaches are beneficial for extensive data.
B. Abstractive Text Summarization
Abstractive text summarization is the development and automation of the traditional method of text summarization . The abstractive process identifies key sections and the main ideas of a text document by paraphrasing them. The abstractive summarization process follows some common steps as follows:
a. Structure-Based Methods: The structure-based approach continuously filters the most critical data from documents by applying abstract or cognitive algorithms. The algorithms for tree-based, template-based ontology, rule-based ontology are the most commonly used .
b. Semantic-Based Methods: The semantic-based approach attempts to refine the sentences by implementing the NLP on the entire document. This approach can easily find the noun and verb phrases using some methods.
IV. PRE-PROCESSING TECHNIQUES
IN ATS Several pre-processing are performed to clean the noisy and unfiltered text. Erroneous messages and chats, including slang or trash phrases, are known as ‘‘noisy’’ and ‘‘unfiltered text’’. The approaches mentioned below appear to be some of the most often utilized pre-processing procedures:
V. FEATURE EXTRACTION
In Ats Feature extraction is a technique for discovering topic sentences, essential data traits or attributes from the source documents. ATS follows two phases to locate the important sentences in the text: extracting features and text representation approach. This section describes the most often used extraction features and text representation approaches for generating sentences for text summarization.
Features Collecting the essential features is the first phase of the feature extraction process. It is necessary to represent the sentences as vectors or score them to find a vital sentence from a document. Some features are used as attributes to define the text for this task. The most prevalent features for calculating the score of a sentence and indicating the degree to which it belongs to a summary are given below:
Text Representation
The text representation models are now utilized to represent the input documents in a better shape. In NLP, text representation approaches imply translating words into numbers so that computers can comprehend and decode patterns within a language. Generally, these approaches develop a connection between the chosen phrase and the context word from the document. Some popular text presentation methods such as bag-of-words, n-gram, and word embedding are discussed below:
VI. MOTIVATION AND APPLICATION OF ATS
Text summarization is a branch of Natural Language Processing (NLP) that focuses on shortening texts and making them more readable for users. With an excess of data accessible on the internet and the necessity to comprehend it in order to save the reader\'s time, text summary techniques are utilized. This paper provides a quick overview of text preprocessing, used to clean data to do effective summarization. Then it summarizes the many types of text summarizing approaches, categorizing them according to input, output, content, and purpose. The paper\'s primary emphasis is on extractive and abstractive text summarizing algorithms based on output. Extractive summarization summarizes by simply extracting information from the input text. Abstractive summarization is a more complicated method because it summarizes the text in its language. The abstractive technique produces better and more semantically connected summaries. Readers would benefit significantly from an overview of the benefits and drawbacks of different techniques, as well as a concise explanation. Text summarization techniques can be applied helpfully depending on the user\'s needs.
[1] H. P. Luhn, ‘‘The automatic creation of literature abstracts,’’ IBM J. Res. Develop., vol. 2, no. 2, pp. 159–165, Apr. 1958. [2] T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘‘Efficient estimation of word representations in vector space,’’ 2013, arXiv:1301.3781. [3] Z. S. Harris, ‘‘Distributional structure,’’ Word, vol. 10, nos. 2–3, pp. 146–162, 1954. [4] S. Gholamrezazadeh, M. A. Salehi, and B. Gholamzadeh, ‘‘A comprehensive survey on text summarization systems,’’ in Proc. 2nd Int. Conf. Comput. Sci. Appl., Dec. 2009, pp. 1–6. [5] C. Saranyamol and L. Sindhu, ‘‘A survey on automatic text summarization,’’ Int. J. Comput. Sci. Inf. Technol., vol. 5, no. 6, pp. 7889–7893, 2014. [6] R. Mishra, J. Bian, M. Fiszman, C. R. Weir, S. Jonnalagadda, J. Mostafa, and G. D. Fiol, ‘‘Text summarization in the biomedical domain: A systematic review of recent research,’’ J. Biomed. Informat., vol. 52, pp. 457–467, Dec. 2014. [7] N. Andhale and L. A. Bewoor, ‘‘An overview of text summarization techniques,’’ in Proc. Int. Conf. Comput. Commun. Control Autom. (ICCUBEA), Aug. 2016, pp. 1–7. [8] S. K. Bharti and K. S. Babu, ‘‘Automatic keyword extraction for text summarization: A survey,’’ 2017, arXiv:1704.03242. [9] R. Mihalcea and H. Ceylan, ‘‘Explorations in automatic book summarization,’’ in Proc. 2007 joint Conf. empirical methods natural Lang. Process. Comput. natural Lang. Learn. (EMNLP-CoNLL), 2007, pp. 380–389. [10] N. V. Kumar and M. J. Reddy, ‘‘Factual instance tweet summarization and opinion analysis of sport competition,’’ in Soft Computing and Signal Processing. Singapore: Springer, 2019, pp. 153–162.
Copyright © 2023 Sakshi Jawale, Pranit Londhe, Prajwali Kadam, Sarika Jadhav, Rushikesh Kolekar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET51815
Publish Date : 2023-05-08
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here