Abstract: While multivariate logistic regression classifiers are a great way of implementing collaborative filtering - a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many other users, we can also achieve similar results using neural networks. A recommender system is a subclass of information filtering system that provide suggestions for items that are most pertinent to a particular user. A perceptron or a neural network is a machine learning model designed for fitting complex datasets using backpropagation and gradient descent. When coupled with advanced optimization techniques, the model may prove to be a great substitute for classical logistic classifiers. The optimizations include feature scaling, mean normalization, regularization, hyperparameter tuning and using stochastic/mini-batch gradient descent instead of regular gradient descent. In this use case, we will use the perceptron in the recommender system to fit the parameters i.e., the data from a multitude of users and use it to predict the preference/interest of a particular user.
Introduction
I. A BRIEF HISTORY AND INTRODUCTION
A. Neural Networks
Neural network in machine learning is a model made up of artificial neurons or nodes. The connections between the nodes are modelled as weights. A positive weight reflects an excitatory connection while a negative weight reflects an inhibitory connection. The inputs are modified by the weights and summed; an activity known as linear combination. An activation function at the end controls the amplitude of the output i.e., brings the output in a desirable range – usually 0 to 1 or -1 to 1.
In 1943, Warren McCulloch and Walter Pitts from the University of Illinois and the University of Chicago published research that analysed how the brain could produce complex patterns and could be simplified down to a binary logic structure with only true/false connections. Frank Rosenblatt from the Cornell Aeronautical Laboratory was credited with the development of the perceptron in 1958. His research introduced weights to McColloch's and Pitt's work, and Rosenblatt leveraged his work to demonstrate how a computer could use neural networks to detect imagines and make inferences.
The next step in the development of neural networks came in 1982 with the development of ‘Hopfield Networks’ by John Hopfield. A Hopfield network is a fully interconnected recurrent neural network where each unit is connected to every other unit. It behaves in a discrete manner and produces distinct outputs in generally binary (0/1) form or in bipolar (-1/1) form. In a recurrent neural network, the outputs of the neurons are again fed into the network as ‘memory’ that improves the current output and input of the network.
The backpropagation algorithm, which nowadays forms the basis of the neural networks, though independently discovered many times earlier, the modern form was proposed by Yann LeCun in 1987. Paul Werbos was the first in US to propose that backpropagation could be used for neural nets after deep analyzation in his 1974 dissertation. The Werbos method was rediscovered in 1985 by Parker and in 1986 by David Everett Rumelhart, Geoffrey Everest Hinton and Ronald J. Williams. However, the basics of continuous backpropagation were derived in the context of ‘Control Theory’ by Henry J. Kelley and Arthur E. Bryson in 1960 and 1961 respectively
After a period of minor developments during the 2000s, the resurrection of neural networks began in 2010s because they benefitted from the cheap, powerful GPU-based computing systems. This is much more noticeable in the fields of speech recognition, machine vision, natural language processing (NLP) and language structure learning. Other use cases for neural networks include facial recognition, stock market prediction, social media, aerospace, health etc.
Convolutional Neural Networks (CNN) are different from traditional neural networks (Feedforward Neural Networks) and recurrent neural networks. While recurrent neural networks are commonly used for natural language processing and speech recognition, convolutional neural networks are more often utilized for classification and computer vision tasks. The convolutional layer is the core building block of a CNN, and it is where the majority of computation occurs
B. Collaborative Filtering
A recommender system is a subclass of Information Filtering System, that provides suggestions for items that most appropriate for a particular user, based on the data collected from a multitude of users. Recommender systems are particularly useful as they can help a user choose properly when there is an overwhelming number of options. Both the users and the services provided have benefited from these kinds of systems. The quality and decision-making process has also improved through these kinds of systems.
Collaborative Filtering is a technique used by recommender systems for making automated predictions about the interests of a user by collecting preferences from many users. In the more general sense, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Collaborative filtering can be applied to various kinds of data including sensing and monitoring data, financial data etc. The overwhelming amount of data necessities mechanisms for efficient information filtering. Collaborative filtering is one of the techniques used for solving this problem.
Though various sources contradict, the discovery of the collaborative filtering algorithm is generally accredited to Dave Goldberg and his colleagues at Xerox PARC. The origins of modern recommender systems date back to the early 1990s when they were mainly applied experimentally to personal email and information filtering. Today, 30 years later, personalized recommendations are ubiquitous and research in this highly successful application area of AI is flourishing more than ever. Much of the research in the last decades was promoted by advances in machine learning technology. In 1992, the concept of “Collaborative Filtering” was introduced with an experimental mail system called Tapestry.
II. THE CLASSICAL APPROACH
A. Use of Logistic Classifiers
Collaborative Filtering is usually done using multivariate logistic regression as the output is discrete in nature (Logistic regression corresponds to discrete output, used in classification problems; while linear regression corresponds to continuous output, used in data-value prediction problems). With conventional regularized multivariate regression, appreciable accuracy is achieved. But, neural networks (perceptron) can help us to achieve considerably more accuracy using backpropagation and gradient descent. The classical approach to collaborative filtering using logistic regression is as follows:
IV. ACKNOWLEDGEMENT
First, I would like to thank my parents and family members for their constant support. Next, I would like to thank all the teachers of the computer science department of Delhi Public School, Ruby Park for understanding my potential and helping to increase the same. Finally, I am heavily indebted to all the editors of the journal for their time and effort in going through the research paper.
Conclusion
While logistic regression classifiers provide a fairly accurate result of the user’s recommendation based on the data collected from a multitude of users, we can further enhance the accuracy of the model by the introduction of neural networks. As neural networks are designed to fit complex datasets, we can form functions of higher complexity and thus get a better fit for the data without running into issues like high bias (under-fitting) and high variance (over-fitting). While neural networks can fit the datasets better than logistic classifiers, they are often computationally more expensive. Thus, we implement a series of optimization techniques to get the best possible results with the least computational work. This includes preprocessing of data i.e., implementing mean normalization and feature scaling. While training the model, we use gradient checking to ensure backpropagation is implemented correctly, we use regularization to overcome the issue of under-fitting and over-fitting due to high bias and high variance respectively. We initialize the parameter vectors using random initialization. To make the model computationally more efficient, instead of regular batch gradient descent, we use stochastic/mini-batch gradient descent. Finally, to achieve the best set of hyperparameters i.e., the learning rate (?), the regularization parameter (?) and the mini-batch size (b), we use hyperparameter tuning using GridSearchCV or RandomizedSearchCV.
More advanced algorithms such as Momentum (used for reducing the high variance in SGD and softening the convergence), Nesterov Accelerated Gradient (made so that Momentum does not miss the local minima), AdaGrad (overcomes the problem of having a constant learning rate and therefore implements a dynamic learning rate), AdaDelta (removes the decaying learning rate problem of AdaGrad) and Adaptive Momentum Estimation (works with momentums of first and second order) can be used instead of gradient descent to further improve the computational efficiency and accuracy. For neural networks specifically, Adaptive Momentum Estimation tends to be the best optimizer.
The objective (section III-A) and the working (section III-B) of the model have been explained with the specific example of recommending a movie to a user. However, the similar approach can be followed in other use cases as the core concept remains the same. The most important part is the implementation of gradient checking just after the calculation of the derivatives using backpropagation to ensure the proper working of the backpropagation algorithm. The perceptron collaborative filtering can be used in OTT platforms to recommend movies to a user based on the rating of other users. It can also be used in online shopping platforms to recommend certain products to certain people based on relatability and rating/reviews.
References
[1] Ulrika Jägare, “Data Science Strategy”
[2] U. Dinesh Kumar, “The Science of Data-Driven Decision Making”
[3] Manaranjan Pradhan and U. Dinesh Kumar, “Machine Learning using Python”
[4] Abraham Silberschatz, Peter B. Galvin and Greg Gagne, “Operating System Concepts”
[5] Richard P. Feynman, “Feynman Lectures on Computation”
[6] What is a Neural Network, https://www.investopedia.com/terms/n/neuralnetwork.asp, last accessed on 2023-02-03
[7] Hopfield Network, https://en.wikipedia.org/wiki/Hopfield_network, last accessed on 2023-02-03
[8] Hopfield Neural Network, https://www.geeksforgeeks.org/hopfield-neural-network/, last accessed on 2023-02-03
[9] Hopfield Network, https://www.javatpoint.com/artificial-neural-network-hopfield-network, last accessed on 2023-02-03
[10] What are recurrent neural networks, https://www.ibm.com/topics/recurrent-neural-networks, last accessed on 2023-02-03
[11] Backpropagation, https://en.wikipedia.org/wiki/Backpropagation, last accessed on 2023-02-03
[12] History of Artificial Neural Network, https://www.javatpoint.com/history-of-artificial-neural-network, last accessed on 2023-02-03
[13] Convolutional Neural Networks, https://www.ibm.com/topics/convolutional-neural-networks, last accessed on 2023-02-03
[14] What Are Recommendation Systems in Machine Learning, https://www.analyticssteps.com/blogs/what-are-recommendation-systems-machine-learning, last accessed on 2023-02-03
[15] Recommender system, https://en.wikipedia.org/wiki/Recommender_system, last accessed on 2023-02-03
[16] United we find, https://www.economist.com/technology-quarterly/2005/03/12/united-we-find, last accessed on 2023-02-03
[17] Recommender Systems: Past, Present and Future, https://ojs.aaai.org/index.php/aimagazine/article/view/18139, last accessed on 2023-02-03, DOI: 10.1609/aaai.12012
[18] Activation Functions in Neural Networks, https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6, last accessed on 2023-02-03
[19] How to Debug a Neural Network with Gradient Checking, https://towardsdatascience.com/how-to-debug-a-neural-network-with-gradient-checking-41deec0357a9, last accessed on 2023-02-03
[20] The Distance Between Two Vectors, http://mathonline.wikidot.com/the-distance-between-two-vectors, last accessed on 2023-02-03
[21] Regularization Parameter, https://www.sciencedirect.com/topics/engineering/regularization-parameter, last accessed on 2023-02-03
[22] Hyperparameter (machine learning), https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning), last accessed on 2023-02-03
[23] Stochastic Gradient Descent, https://towardsdatascience.com/stochastic-gradient-descent-clearly-explained-53d239905d31, last accessed on 2023-02-03
[24] Various Optimization Algorithms for Training Neural Network, https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6, last accessed on 2023-02-03
[25] Andrew Ng, “Machine Learning Yearning”