Price Prediction of Used Cars Using Machine Learning

Authors: Mr. Ram Prashath R, NIthish C N, Ajith Kumar J

DOI Link: https://doi.org/10.22214/ijraset.2022.43459

Abstract

The goal of this study is to develop a model that can anticipate fair used car pricing based on a variety of factors such as vehicle model, year of manufacture, fuel type, Price, Kms Driven . In the used car market, this strategy can benefit vendors, purchasers, and car manufacturers. It can then produce a reasonably accurate price estimate based on the data that users provide. Machine learning and data science are used in the model-building process. The data was taken from classified ads for second hand autos. To attain the maximum accuracy, the researchers used a variety of regression approaches, including linear regression, polynomial regression, support vector regression, decision tree regression, and random forest regression. This project visualized the data to better comprehend the dataset before starting the model-building process. To assure the regression\'s performance, the dataset was partitioned and changed to fit the regression. R-square was used to evaluate the performance of each regression .The final model contains more elements of used autos than earlier research while also having a higher forecast accuracy.

Introduction

I. INTRODUCTION

Due to the numerous elements that influence a used vehicle's market pricing, determining if the advertised price is accurate is a difficult undertaking. The goal of this research is to create machine learning models that can properly forecast the price of a used car based on its attributes so that buyers can make educated decisions. On a dataset containing the sale prices of various brands and models, we build and analyse several learning approaches. We'll examine the results of numerous machine learning algorithms, such as Linear Regression, Ridge Regression, Lasso Regression, Elastic Net, and Decision Tree Regressor, and pick the best one. The car's pricing will be determined based on a number of factors. Regression Algorithms are employed because they offer us with a continuous number as an output rather than a categorized value, allowing us to anticipate the real price of a car rather than its price range. A user interface has also been created that takes input from any user and displays the price of a car based on their inputs. There are three types of fuel data sets here. They are Diesel , Petrol and LPG are used here.

II. LITERATURE SURVEY

Price prediction of used car using machine learning techniques is the first paper. They look at how supervised machine learning techniques can be used to estimate the price of second hand cars in mauritius in this study. The forecasts are based on historical data taken from daily publications. To make the predictions, various techniques such as multiple linear regression analysis, were employed. According to author Sameerchand, car price estimates on historical data gathered from daily newspapers. For estimating the price of cars, they employed supervised machine learning algorithms. Other methods that have been employed include multiple linear regression, k-nearest neighbor algorithms, nave based, and various decision tree algorithms. The best algorithm for prediction was identified after comparing all four algorithms. They had some issues comparing the algorithms, but they succeeded to do so.

According to authors Enis Gegic et al, the focus of this paper is on scraping data from an online site utilising web scraping techniques. These were then compared using several machine learning techniques to forecast the vehicle pricing in a simple manner. They divided the pricing into distinct price groups that had already been established. On different datasets, artificial neural networks, support vector machines, and random forest methods were utilized to develop classifier models.

In this study, Wu et al. exhibit automobile price prediction using a neural fuzzy knowledge-based system. They projected a model that has similar outcomes to the simple regression model by taking into account the following attributes: brand, year of manufacturing, and kind of engine. They have developed an expert system called ODAV (Optimal Distribution of Auction Automobiles) because there is a strong demand for car dealers to sell leased vehicles at the end of the lease year. This method provides information on the greatest vehicle pricing as well as the best location to get them. The K-nearest neighbor machine

Learning approach, which is based on regression models, was used to estimate the price of autos. Because a greater number of vehicles have been transferred through this system, it is more effectively managed.

This research, according to authors Pattabiraman, focuses more on the relationship between seller and buyer. More features are required to anticipate the price of four wheelers, such as the already stated price, mileage, make, model, trim, type, cylinder, litre, doors, cruise, sound, and leather.

With the use of a statistical analysis method for exploratory data analysis, the price of a vehicle was forecasted using these features.

III. METHODOLOGY

In this section, we'll go over the many algorithms and datasets that were used to create this module. The model will be trained using a dataset with 92386 records. The value of an automobile is determined by factors such as kilometers travelled, year of registration, fuel type, car model, financial power, car brand, and gear type. We implemented five algorithms because this is a regression problem: Lasso Regression, Ridge Regression, Linear Regression.

A. Lasso Regression

The lasso regression allows you to shrink or regularize these coefficients to avoid overfitting and make them work better on different datasets. This type of regression is used when the dataset shows high multi collinearity or when you want to automate variable elimination and feature selection.

Where,

Xij = Features of Y or Independent Variable

Yi = Dependent Variable

βi = Weights or Magnitude shows importance of a feature

λ = minimize the cross-validation prediction error rate.

B. Ridge Regression

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where linearly independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering.Ridge regression is a sort of linear regression that introduces a little degree of bias in order to improve long-term predictions.

Ridge regression is a model regularization technique that reduces the model's complexity.L2 regularization is another name for it.The cost function is changed in this method by including a penalty term

Ridge Regression penalty is the degree of bias introduced into the model. We may determine it by multiplying the squared weight of each individual label by the lambda.

C. Linear Regression

Quick to train and test as a baseline algorithm

IV. OBJECTIVE

A. To create an efficient and effective model that estimates the price of a used car based on the inputs of the user.

B. To obtain high precision.

C. To create a user-friendly User Interface (UI) that receives input from the user and forecasts the pricing

V. DATASET

A. Dataset into Data frame

Dataset is given in columns and classifies as

Company
Model
Fuel type
Kilometers
Year of purchase

VI. HEAT MAP

Conclusion

Because of the large number of characteristics that must be examined for an effective prediction, car price prediction will be a difficult assignment. The collecting and preparation of data is the most crucial step in the prediction process. Car data collected from kaggle.com is transformed into CSV format and used to create machine learning algorithms during the research. In this study, three algorithms were used: Linear, Lasso, and Ridge Regression. SVM classifier separated the data into two portions for training and testing purposes (Support Vector Machine). i.e., 75% of the data was used for machine learning training and 25% of the data was used for machine learning testing. The three machine learning models\\\' accuracy was tested and compared against one another. This is an important comparison between single and multiple groups of machine learning algorithms. As a result, this model will assist in predicting the car\\\'s actual price.

References

[1] Enis Gegic, Becir Isakovic, Dino Keco, Zerina Masetic, Jasmin Kevric. “Car Price Prediction Using Machine Learning”;(TEM Journal 2019). [2] Sameerchand Pudaruth, “Predicting the Price of Used Cars using Machine Learning Techniques”;(IJICT 2014). [3] Richardson, M. S. (2009). Determinants of used vehicle resale value [4] Wu,etal,(2009). An expert system of price forecasting for used vehicles using adaptive neuro-fuzzy inference. [5] Doan Van Thai, Luong Ngoc Son, Pham Vu Tien, Nguyen Nhat Anh, Nguyen Thi Ngoc Anh, “Prediction car prices using qualify qualitative data and knowledge-based system” (Hanoi National University)

Copyright

Copyright © 2022 Mr. Ram Prashath R, NIthish C N, Ajith Kumar J. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET43459

Publish Date : 2022-05-28

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here