House Price Prediction: Comparative Analysis of Regression-Based Machine Learning Algorithms

Authors: David Emmanuel Aniobi, Chukwuemeka Oluebube Ochuba, Saater Benedicta Nguideen

DOI Link: https://doi.org/10.22214/ijraset.2023.56232

Abstract

Advancement in technology has revolutionized the ways of doing things in contemporary time. One of such is Artificial intelligence which has given birth to myriads of methods/techniques employed in solving real life problems. Many Machine learning techniques can be used in predicting house prices with several factors in consideration. House prices rise on annual basis, triggering the need for house price prediction models. Predictive models enable families to acquire a house of their choice when accurately developed. There have been a significant number of articles that adopt traditional machine learning algorithms to successfully estimate house prices, but they rarely compare the performance of individual models. This study will extensively test and compare numerous machine learning technique and present an optimistic model that will be used in developing a house pricing prediction system. Several models were developed and compared using Linear Regression (LR), Least Absolute Shrinkage & Selection Operator Regression (LASSO-R), Ridge Regression (RR), K-Nearest Neighbours Regression (KNN-R), Decision Tree Regression (DTR) and Extra Trees Regression (ETR) algorithms. Implemented using Python programming language. Amongst them ETR outperformed the others with MSE (16233.4), RMSE (128.7), MAE (49.6) and R2 (0.63) while the least performed is KNN-R with MSE (45763.3), RMSE (213.9), MAE (99.2) and R2 (-0.04).

Introduction

I. INTRODUCTION

Houses are very essential to humans as it provides shelter for man to comfortable live in his environment; it is not necessarily meant for luxury. The price of this property has to be ascertained as it has become a necessity in life. House price prediction tools has become a subject of interest among real estate professionals, data scientists, and homeowners [1]. While some argue that predicting house prices using machine learning algorithms is a valuable tool for property buyers and sellers, others question its ethical concerns [2].

However, the relevance of house price prediction extends beyond real estate transactions. It has the potentials to reshape the real estate market and town planning and offers a wide range of economic benefits [3]. Economically, accurate prediction of house prices can help developers, policymakers, and property investors in making informed decisions. Societally, it can help with resource allocation, urban planning, and concerns related to housing affordability [4].

While house price prediction offers numerous benefits, it also raises ethical questions. Concerns about fairness, discrimination, and the potential for market manipulation should not be underestimated. As machine learning models are trained on historical data, they may inadvertently perpetuate biases present in past housing practices. The need for efficient house pricing system gives rise to House price prediction tools, driven by machine learning algorithms [5] These advanced tools can inadvertently exacerbate existing inequalities and disparities within the real estate market. There are concerns that these tools may might perpetuate biases and discrimination in housing transactions, thereby resulting in inequitable pricing and unequal access to housing opportunities [6] These unnecessary biases can be reduced if not eliminated by using the optimal algorithm in developing the system.

Although, house price prediction using machine learning holds the potential for transformative insights in the real estate market, it also introduces a host of ethical and fairness concerns that demand thoughtful consideration and proactive solutions to ensure they are employed in a just, transparent, and responsible manner for the benefit of all stakeholders. This will ensure that certain neighborhoods are not favored over others. Therefore, it is essential to ensure that predictive algorithms are designed with fairness and transparency in mind. This paper focuses on comparing the performance of six different machine learning regression-based algorithms and the use of the optimal algorithm in developing a house price prediction model.

II. RELATED LITERATURE

The price of a house either for short-term or long-term based has to be ascertained as it has become a necessity in life.

This transformation is not limited to prospective homebuyers alone but extends to the entire spectrum of individuals involved in the real estate industry, including sellers, investors, and the broader community. The real value of a house is influenced by several factors, including the number of rooms, location (with rural areas often having lower costs than cities), proximity to amenities such as highways, malls, supermarkets, and job opportunities, and access to quality educational facilities. Furthermore, the size of the property, its condition, the standing of the location and the overall economic climate all play vital factors in deciding real estate values [7]. The combination of these factors and local market dynamics shapes the price of a house in any region.

House prediction model based on regression analysis and particle swarm optimization (PSO) was proposed by [8]. Reference [9] proposed a hybrid LASS0 and Gradient boosting regression model that promises better prediction. LASSO was used in feature selection and they did many iterations of feature engineering to find the optimal number of features that will improve the prediction performance. Reference [5] suggested that using a mix of models is necessary. Their work proved that a linear model tends to have high bias, leading to underfitting while a high model complexity-based model tends to have high variance, resulting in overfitting. Balancing these two approaches is crucial to achieve optimal model performance leading to a fairer assessment of land prices and potentially increased revenue for the government. Comparison of artificial neural network and multiple linear regression for house price prediction was presented by [10]). In their study, the impact of different morphological measures on live weight were modelled by artificial neural networks and multiple linear regression analysis. Genetic algorithms were employed by [11] to determine parameters of machine learning models. Pragmatic results revealed that attribute selection for machine learning models in this study does improve performances forecasting models in forecasting accuracy.

Reference [12] predicted the sale price of the houses using various machine learning algorithms like, Random Forest, XGBoost, LightGBM, Hybrid Regression and Stacked Generalization and compared the accuracy. They found out that each model has its advantages and limitations. The Random Forest method has the lowest error on the training set but is prone to be over?tting and its time complexity is high. XGBoost and LightGBM has the best time. Hybrid Regression performs better due to the generalization. Stacked Generalization Regression is best when accuracy is a top priority but has a complicated architecture and worst time complexity. Random Forest Regression, Decision Tree Regression, Ridge Regression, LASSO Regression, Ada-Boost Regression, XGBoost Regression Algorithms were compared by [13] in predicting house prices. Scores and Root Mean Square Error (RMSE) were used to evaluate and it was found out that the Decision Tree Regression algorithm has the highest RMSE.

Reference [14] utilized Random Forest (RF) algorithm for predicting house prices in London. Despite having a small dataset size, the study proved that RF outperformed the traditional regression approach based on Generalized Linear Models (GLMs) in terms of prediction improvement. Their findings suggested that RF was able to capture complex relationships and patterns in the data more effectively than the GLM which could explain its superior performance in this study. Reference [15] predicted property values using different algorithms like Support Vector Regression (SVR), Decision Tree, Regression-Particle Swarm Optimization (R-PSO), and LUCE. The findings implied that LUCE provided a more effective and reliable solution for estimating property values, especially in situations where there is a lack of recent sold prices and sparse house data. Reference [16] used 18-year of housing property data to train models with utilising stochastic gradient descent-based support vector regression, random forest and gradient boosting machine. They demonstrated that advanced machine learning algorithms can achieve very accurate prediction of property prices, as evaluated by the performance metrics.

Reference [17] compared the performance of several machine learning algorithms, including XGBoost, Random Forest, Decision tree, and Linear Regression, to determine the most suitable algorithm for developing automated house purchasing system. XGBoost outperformed the other algorithms because of its ability to handle complex relationships between features and target variables. Reference [18] presented a machine learning approach in prediction and analysis of House Price. It employed the use of Linear regression., multiple linear regression, LASSO, and gradient boosting techniques in predicting house prices. Reference [19] also proposed a house price prediction system using machine learning algorithms (i.e. linear regression, decision tree regression, random forest regression, and artificial neural networks) and visualization to make accurate predictions. The results showed that the system can predict house prices with great accuracy.

Random Forest based model House Price Prediction was developed by [20] using datasets from UCI machine learning repository Boston. They opined that housing prices are closely correlated with factors such as city, population and location, etc. and predicting individual housing prices needs information other than House Price Index (HPI). A regression based predictive system for house and rent price was presented by [21] opined that housing datasets often exhibit missing values, outliers, and inconsistent formats, which can hinder the performance of prediction models. Their system tackled these challenges and provided a more accurate and reliable predictions for real estate. Even though a lot of works have been done on house price prediction using several algorithms, yet no single work has a combination of all the algorithms we a proposing in this paper.

Conclusion

This paper investigates different models for house price prediction. Six different types of Machine Learning regression-based methods including are Linear Regression, LASSO Regression, Ridge Regression, K-Nearest Neighbors Regression, Decision Tree Regression and Extra Trees Regression. The models achieved desirable results but each have their pros and cons. Linear Regression offers simplicity but may not capture complex nonlinear relationships in the data. LASSO Regression introduces L1 regularization, which can lead to feature selection and a more parsimonious model. It helps prevent overfitting, but it might result in some coefficients being exactly zero, effectively eliminating certain features from consideration. Ridge Regression employs L2 regularization to control the model\'s complexity. It provides a balance between feature selection and model complexity, which can be beneficial when dealing with high-dimensional datasets. K-NN captures complex patterns but sensitive to the choice of the number of neighbors (K) and distance metrics. Decision Tree can handle nonlinear relationships but it is prone to overfitting, especially when the tree depth is not appropriately controlled. Extra Trees Regression is less prone to overfitting compared to traditional decision trees. Comparatively the performance of extra tree regression is found to be better than the rest in predicting the house prices. In future the dataset can be prepared with more features and advanced machine learning techniques can be for constructing the house price prediction model.

References

[1] D. Flagella, “what is machine learning?,” 2019, [Online]. Available: https://emerj.com/ai-glosary-terms/what-is-machine-learning/ [2] A. Aladangady, “Housing Wealth and Consumption: Evidence from Geographically Linked Microdata.,” Am Econ Rev, vol. 107, no. 11, pp. 3415–3446, 2017, [Online]. Available: ttp://www.jstor.org/stable/44871793 [3] K. Ngiam and I. Khor, “Big data and machine learning algorithms for health-care delivery,” Lancet Oncol, vol. 20, pp. e262–e273, May 2019, doi: 10.1016/S1470-2045(19)30149-4. [4] P. Harrington, Machine Learning in Action. MANNING Publication Co., Shelter Island, 2012. doi: 10.4018/978-1-4666-0059-1.ch008. [5] A. Babu and A. S. Chandran, “Literature Review on Real Estate Value Prediction Using Machine Learning,” International Journal of Computer Science and Mobile Applications, vol. 7, pp. 8–15, 2019, [Online]. Available: www.ijcsma.com [6] E. K. Ampomah, Z. Qin, and G. Nyame, “Evaluation of tree-based ensemble machine learning models in predicting stock price direction of movement,” Information (Switzerland), vol. 11, no. 6, 2020, doi: 10.3390/info11060332 [7] R. Kanilelori, “Different Types Of Houses In Nigeria (With Pictures).” Accessed: Oct. 16, 2023. [Online]. Available: https://klrealtors.ng/different-types-of-houses-in-nigeria-with-pictures/ [8] N. A. Alfiyatin, R. E. Febrita, H. Taufiq, and F. M. Wayan, “Modeling House Price Prediction using Regression Analysis and Particle Swarm Optimization Case Study?: Malang, East Java, Indonesia,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, pp. 6–10, 2017, doi: 10.14569/ijacsa.2017.081042. [9] S. Lu, Z. Li, Z. Qin, X. Yang, and R. S. M. Goh, “A hybrid regression technique for house prices prediction,” in 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 2017, pp. 319–323. doi: 10.1109/IEEM.2017.8289904. [10] A. Ahmad and A. Nawar, “House Price Prediction,” KRISTIANSTA UNIVERSITY SWEDEN, 2020. [Online]. Available: https://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-20945 [11] P. F. Pai and W. C. Wang, “Using machine learning models and actual transaction data for predicting real estate prices,” Applied Sciences (Switzerland), vol. 10, no. 17, pp. 1–11, 2020, doi: 10.3390/app10175832. [12] T. Quang, N. Minh, D. Hy, and M. Bo, “Housing Price Prediction via Improved Machine Learning Techniques,” Procedia Comput Sci, vol. 174, no. 2019, pp. 433–442, 2020, doi: 10.1016/j.procs.2020.06.111. [13] B. Sivasankar, A. P. Ashok, G. Madhu, and F. S., “House Price Prediction,” International Journal of Computer Sciences and Engineering, vol. 8, no. 7, pp. 762–767, 2020, doi: 10.37896/ymer21.05/87. [14] S. Levantesi and G. Piscopo, “The importance of economic variables on London real estate market: A random forest approach,” Risks, vol. 8, no. 4, pp. 1–17, 2020, doi: 10.3390/risks8040112. [15] H. Peng et al., “Lifelong Property Price Prediction: A Case Study for the Toronto Real Estate Market,” IEEE Trans Knowl Data Eng, vol. 35, no. 3, pp. 2765–2780, 2023, doi: 10.1109/TKDE.2021.3112749. [16] W. K. O. Ho, B. S. Tang, and S. W. Wong, “Predicting property prices with machine learning algorithms,” Journal of Property Research, vol. 38, no. 1, pp. 48–70, 2021, doi: 10.1080/09599916.2020.1832558. [17] S. Dabreo, S. Rodrigues, V. Rodrigues, and P. Shah, “Real Estate Price Prediction,” International Journal of Engineering Research & Technology (IJERT), vol. 10, no. 04, pp. 541–543, 2021, doi: IJERTV10IS040322. [18] M. Kandasamy, R. Shanmugam, A. Dave, C. Chawda, K. Shah, and U. Seladiya, “Prediction and Analysis of House Price Through Machine Learning Approach,” International Journal for Multidisciplinary Research (IJFMR), vol. 5, no. 4, pp. 1–7, 2023, doi: E-ISSN: 2582-2160. [19] M. S. Supriya, G. S. Vinayak, V. R. Patgar, and V. Mahajan, “House Price Prediction System using Machine Learning Algorithms and Visualization,” in 2023 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), 2023, pp. 1–6. doi: 10.1109/CONECCT57959.2023.10234749. [20] B. A. Adetunji, A. O. Noah, and A. F. Ajala, “House Price Prediction using Random Forest Machine Learning Technique,” ScienceDirect, Procedia Computer Science, vol. 199, pp. 806–813, 2022, doi: 10.1016/j.procs.2022.01.100. [21] S. Khandaskar, C. Panjwani, V. Patil, D. Fernandes, and P. Bajaj, “House and Rent Price Prediction System using Regression,” in 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), 2023, pp. 1733–1739. doi: 10.1109/ICSCSS57650.2023.10169290. [22] K. Khushbu and Y. Suniti, “Linear regression analysis study,” Journal of the Practice of Cardiovascular Sciences, vol. 4, no. 1, p. 33, 2018, doi: 10.4103/jpcs.jpcs_8_18. [23] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [24] L. E. Melkumova and S. Y. Shatskikh, “Comparing Ridge and LASSO estimators for data analysis,” Procedia Eng, vol. 201, pp. 746–755, 2017, doi: 10.1016/j.proeng.2017.09.615. [25] K. G. Burcu and D. K. Ipek, “Regression Analyses or Decision Trees??,” Manisa Celal Bayar University Journal of Social Sciences, vol. 18, no. 4, pp. 251–260, 2020, doi: 10.18026/cbayarsos.

Copyright

Copyright © 2023 David Emmanuel Aniobi, Chukwuemeka Oluebube Ochuba, Saater Benedicta Nguideen. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET56232

Publish Date : 2023-10-20

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here