The research and advancements in the field of “legal informatics” has resulted into development of various legal databases. Thus, huge volume of legal information gets generated. This information growth has accelerated the need to develop legal ontologies. Ontologies are widely used by legal practitioners, researchers and ordinary citizens for simulating legal actions, performing linguistics search and classification, and to stay up-to-date with the continual modification of laws. This research contributes to this purpose of developing an open legal ontology.
Introduction
I. INTRODUCTION
Due to the massive amount of legal information available on the internet and also from various other sources it becomes necessary for the analysis community to try to do additional in-depth analysis on the realm of legal text processing. This information growth has compelled the need to develop legal systems which facilitates the users to get relevant legal information with little or no effort.
Over the last thirty years, AI & Law has provided breakthroughs in studies involving case-based reasoning, rule-based reasoning, data retrieval and last abstract models for information illustration and reasoning, called Legal Ontologies. Legal ontologies aim to supply a structured illustration of legal ideas and their interconnections. These ontologies are then exploited to support information extraction and question answering within the legal domain. This study presents the results of scientific mapping of the literature aiming at categorizing legal ontologies on bound dimensions like purpose, level of generality, underlying legal theories among alternative aspects. Through organizing and classifying what has been already made it helps to avoid the previous downside of reinventing the wheel.
II. LITERATURE REVIEW
There are many approaches to text mining. Valentina Leone et al. [1] described the need todevelop a legal ontology.Ontologies represent the standard way to model the knowledge about specific domains. This holds for the legal domain where several ontologies have been put forward to model specific kinds of legal knowledge. Both for standard users and for law scholars, it is often difficult to have an overall view on the existing alternatives, their main features and their interlinking with the other ontologies. To answer this, the authors performed analysis of the state-of-the-art in legal ontologies and characterised them with distinctive features. That aims to guide generic users and law experts in selecting the legal ontology that better fits their needs and in understanding its specificity so that proper extensions to the selected model could be investigated.
Bartolini C et al. [2] aimed to provide a bottom-up ontology describing the constituents of data protection domain and its relationships. This contribution envisions a methodology to highlight the new duties of data controllers and foster the transition of IT-based systems, services, tools and businesses to comply with the new General Data Protection Regulation. This model may serve as the foundation for the design of data protection compliant information system. Suad A. Alsadi et al. [3] have reviewed about the various data pre-processing techniques to clean the raw data. Raw data is usually susceptible to missing values, noisy data, incomplete data, inconsistent data and outlier data. So, it is important for these data to be processed before being mined. Pre-processing data is an essential step to enhance data efficiency and it leads to data transformation which makes knowledge discovery more efficient. Pre-processing includes several techniques like cleaning, integration, transformation and reduction. This study shows a detailed description of data pre-processing techniques which are used for data mining.Shahmin Sharafat et al. [4] Here, theauthors developed Smart legal systems which carry immense potential to provide legal community and public with valuable insights using legal data. These systems can consequently help in analyzing and mitigating various social issues. In Pakistan, since last couple of years, courts have been reporting judgments online for public consumption. This public data, once processed, can be utilized for betterment of society and policy making in Pakistan. This study takes the first step to realize smart legal system by extracting various entities such as dates, case numbers, reference cases, person names, etc. from legal judgments. To automatically extract these entities, the primary requirement is to construct dataset using legal judgments. Hence, firstly annotation guidelines are prepared followed by preparation of annotated dataset for extraction of various legal entities.
III. PROPOSED SYSTEM
The entire flow of the system is depicted in Fig. 1
A. Dataset
The first step involves downloading of constitution of India dataset. Dataset is collected from the Kaggle data repository. The dataset contains articles from 1-395 and also their sub-articles. The constitution of India is the longest written constitution compared to any other country in the world with 146,385 words in its English language version.
B. Pre-processing
This is the data mining technique which is used to transform the raw data in useful and efficient format. The following steps need to be followed to clean the data:
Tokenization: This is the process by which a large quantity of text is divided into smaller parts called tokens. Here, the sentences are split into words.
Converting into lowercase: The words are converted into lowercase.
Removing punctuation: The punctuations are being removed from the raw data.
Removing stopwords: The stopwords are removed from the raw data.
Words are lemmatized: This is the process of grouping together the different inflected forms of a word.
Words are stemmed: Words are reduced to their root form.
C. Data storing
In this step, the cleaned data which is obtained after pre-processing is imported into the database and stored.
D. Creating a web application
Here, we have developed a web application by using the python flask web framework. This application is designed to provide a search engine to the user. The user can provide any input into the search bar and this entry will be passed on to the database and the desired results (legal ontology) will be displayed.
IV. RESULTS
The proposed system was developed and it provides the following results (legal ontology) as shown in below Fig. 2. Here, as we can see the input was provided and the desired result was generated which is a legal ontology (vocabulary that contains all legal information).
Conclusion
This research has conducted a literature review towards the identification of various data mining techniques and how they were used in developing the legal ontologies. Our developed system will help the citizens to follow latest legislation. Citizens will be able to understand how the Indian law is created and implemented and it will promote transparency. This system will also help to stay updated on the laws which continuously get modified.