Mobile Keyword Prediction using Federated Learning

Authors: Khushi Udaysingh Chouhan, Nikita Pradeep Kumar Jha, Roshni Sanjay Jha, Shaikh Insha Kamaluddin, Dr. Nupur Giri

DOI Link: https://doi.org/10.22214/ijraset.2023.50826

Abstract

Federated learning is a decentralized form of Machine Learning in which data subsets are trained on several edge devices aggregated and brought to the centralized server. Here, federated applications store the local copy on all the edge devices such as smartphones where users can use it accordingly. The model gradually learns and trains itself from inputs by the user’s virtual keyboard and becomes smarter iteratively. Devices transfer the results in the form of parameters to the centralized server where these results are aggregated with the help of federated algorithms. For the purpose of next word prediction using federated learning, we implemented different algorithms such as FedAvg, FedProx, FedSgd. In reference to the traditional ML models, data is collected in a centralized location and is used to train the model. However, unlike ML techniques the Federated Learning techniques give more control over user’s data to themselves since the data is not shared with the central server or other devices. Thus, federated learning adheres to the data privacy and feasibility for its users.

Introduction

I. INTRODUCTION

Next-word prediction feature facilitates text entry and plays an important role in improving the user’s experience. Today, all smart mobile devices are incorporated with virtual keyboards which support more than 600 languages and emoticons. The current mobile keyword prediction models are constrained in multiple ways, such as limited vocabulary, contextual awareness, personalization and multilingual support.

When it comes to the next word prediction feature, the users expect the visible response within a fraction of second. The users might get uptight about the collection and remote storage of their personal data despite use of data cleaning and strict access protocols as in case of the traditional neural networks or N-gram approaches.

This paper aims at showing a federated learning environment that encrypts the user-sensitive data and fixes privacy and latency issues with more feasibility.

Such an approach is referred to as the Secure aggregation Principle where the server is allowed to securely combine the encrypted results and decrypt only the aggregated results. It applies a federated averaging mechanism that plays a role in aggregating encrypted chunks of results from edge devices. The paper shows the comparative study of different federated algorithms to judge the accuracy rate, loss and speed.

II. DATASET AND RELATED WORK

For our work we have used the Shakespeare dataset. In this dataset there are approximately 337 sonnets by Shakespeare. We have used each sonnet as a user for our proposed system.

FSAs and FSTs have been implemented in the context of mobile keyboard’s auto correction and predictions for the next word. LSTM models have also improved the decoding of gestures and provided better results. RNN and N-gram language models optimize the keyword prediction rate and keystroke savings with respect to time latency and memory constraints. With the tremendous increased focus on privacy and government regulation, research into distributed training for neural networks has gained significant momentum.

In particular, federated learning provides a useful extension to server based distributed training, client device based training and computations using locally stored data. Gboard, a mostly used virtual keyboard app, has previously used federated learning to train a model to suggest search engine queries based on typing and speed context.

III. MODEL ARCHITECTURE

Federated Learning has two main components, the aggregator and remote training parties. The aggregator is the server which averages local updates using federated learning algorithms.Preprocessing of data takes place over local data. Training of models takes place at the server side where the server aggregates model updates from the mobile devices. The remote training parties are the edge devices and often referred to as clients that fetch the global model application from the server and transfer their inputs in the form of local updates to the server.

A typical federated environment has a high density of potentially hundreds of millions of client devices, however only a fraction of them are active and available at all time for training purposes. In general, a training round consists of only a subset of available devices or clients which is randomly selected. This is because it is impractical to coordinate the innumerable amount of clients at once. The initial model and the parameters required for training are distributed by the server to the subset of clients that will participate in a round of training and evaluation. On each client the model is iteratively invoked in an independent and parallel manner on a stream of local data batches to generate a new set of model parameters and a set of local metrics (also known as local aggregations). Thereby the model parameters and locally exported metrics are accumulated across the system by running a distributed aggregation protocol suite.

IV. OPEN SOURCE SYSTEM USED

A. TensorFlow Federated or TFF framework

TFF is an open source framework to build ML models on distributed and decentralized data. It allows users to train models using data distributed across multiple devices or servers while maintaining privacy and security. This is achieved by sending code to the device rather than sending the data to any central location, ensuring data stays on the device.

TFF is a high level API that aims at supporting a variety of distributed learning environments, one of which also includes a topic of our interest, Federated Learning. Two important concepts in TFF are ‘tff.learning’ and ‘tff.computation’. The tff.learning package is a higher-level module that provides a set of high level APIs that act as building blocks for federated learning, while tff.computation is a core component of TFF that provides a way to define and execute computations on a distributed system. There are two phases in running a federated computation:

Compile phase: In this phase, TFF first compiles federated algorithms into an abstract serialized representation of the entire distributed computation.
Execution phase: In this phase, TFF provides ways to execute these computations via local simulation.

TFF is a functional programming environment, but many processes involved in federated learning are stateful. The state that changes from round to round contains the set of model parameters to be trained and possibly additional states associated with the optimizer. To standardize the processing of simulated federated datasets, TFF can enumerate a set of clients and create datasets containing data for specific clients that feed directly as input to the generated federated datasets. It also provides an abstract interface.

V. FEDERATED LEARNING

Federated learning offers a distribution strategy that can be employed to train a machine learning model. It is also a decentralized technique as it keeps training data localized on mobile devices and never collects them centrally. Mobile devices act as clients and generate big data that is used for training purposes. The client processes local data and exchanges the model updates with the server, rather than uploading data directly to the server as in case of centralized model training. The server aggregates the weights from the sample of clients to generate an improved and updated global model. The model is then sent to the client and this process is done iteratively. Various aggregation algorithms can be used on the server side to combine the client weights and update the model to produce a new, improved and accurate model.

Despite when the server-hosted data is anonymised, this decentralized on-device processing technique offers some advantages in terms of security and privacy as compared to other server storage methods. Previous research shows that privacy-preserving methods like safe aggregation and differential privacy can both complement from federated learning.Users have direct and hands-on control over their data as the confidential information is kept on client devices only. Each client transmitted model changes are transitory, concentrated and aggregated. The client changes are handled in memory and are instantly deleted after updates become part of the weight vector. They are never saved on the server hence, preserving user privacy and providing secure aggregation.

VI. METHODS

In this work we have used 3 different federated learning methodologies i.e. FedAvg, FedRep, FedProx. Each method differentiates from one another in terms of aggregation.

At each communication round, the central server selects a subset of clients (S ⊆ [N]) and sends global model information to the subset. Each client then updates the global model with its own local data. The client then sends the updated model back to the central server. The server aggregates and updates the global model based on input from clients. This process repeats several rounds of communication until a termination criterion is reached. B. Validation accuracy is met. A client's Di record with a superscript t represents the tth communication between the central server and the selected client. where t ? {1,...,T}.

A. FedAvg

Federated averaging, also known as FedAvg, is a simple and widely-used federated learning algorithm that involves simply averaging the model updates from the client devices to produce the new global model.

It has four hyperparameters; the fraction of clients C for each round, B the local minibatch size, E the number of times each client trains over the local dataset each round (epochs) and the learning rate. We assume that all sample devices complete e epochs and thus it drops stragglers. FedAvg selects clients consecutively prior to each epoch to steer clear of federations with low probability of participating clients. Additionally, the learning rate should be decreased when the objective function is optimized and as the model approaches the global minimum of the loss function to improve the model's accuracy and produce enhanced results.

B. FedSGD

FedSGD is the distributed version of SGD. It partitions the data across multiple computing nodes. Each node uses a single SGD step to compute the gradient from its local SGD for each round. The weights are then calculated and averaged over all nodes. As a data-parallelization approach, FedSGD utilizes the computation power of multiple compute nodes instead of one. This approach accelerates the normal SGD and has been widely used due to the growing size of datasets collected nowadays. Additionally, since FedSGD only executes one SGD step on a local node, averaging updated weights is equivalent to averaging gradients (η denotes steps size):

Key difference between FedAvg and FedSGD is how they handle Non-IID data. Non-IID refers to data without identical and independent distribution. FedAvg tends to be sensitive to the non-iid data, as it assumes that all client devices have the same data distribution. FedSGD , on the other hand, handles non-iid data by allowing each client device to take multiple steps of SGD on its own, which helps reduce impact of local data variations.

C. FedProx

FedProx is a generalization of FedAvg with some modifications to address heterogeneity of data and systems. The learning is again performed in rounds. At each round, the server samples a set of m clients and sends them the current global model. The FedProx algorithm works by adding a proximal term to the objective function that is being optimized. This term penalizes large deviations of the model parameters from the previous round, encouraging the model to converge to a more stable solution. The proximal term is controlled by a hyperparameter called the "proximal coefficient", which determines the strength of the penalty. Additionally, we perform the local optimization for a variable number of epochs according to the system resources.

With the help of Fig 7.1.8 it can be verified that FedProx gives better accuracy as compared to FedAvg and FedSGD models. Also it can be observed that for a greater number of rounds the accuracy tends to reach a constant value and the loss tends to decrease continually.

For comparison purposes the learning rate was set to 0.001 for all the models.

Conclusion

The paper discusses Federated Learning and its models in detail. The dataset used in this paper is Shakespeare with about 18500 samples which includes sonnets and plays. Mobile keyword prediction is a model-centric and horizontal federated learning problem. We have used a variety of federated learning techniques, including FedAvg, FedSGD and Fedprox. It can be inferred from the results that the FedProx model not only converges faster but also tends to give better results for this dataset. Better results can be achieved by increasing the number of rounds further. In the future, we intend to experiment with other federated learning algorithms and employ various datasets in the same or different languages.

References

[1] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Franciose Beaufays,“ Federated Learning for mobile keyboard prediction.” [2] Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, We Li,“ Applied Federated Learning: Improving Google Keyboard Query Suggestions.” [3] Mingqing Chen, Rajiv Mathews, Tom Ouyang, Francoise Beaufays,“ Federated Learning Of Out-Of-Vocabulary Words.” [4] Chaoyang He, Conghui Tan, Hanlin Tang, Shuang Qiu, Ji Liu,“ Central Server Free Federated Learning over Single-sided Trust Social Networks.” [5] Khrystyna Shakhovska, Iryna Dumyn, Natalia Kryvinska,“ An approach for a next-word prediction for Ukrainian Language.” [6] Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov,“Towards Federated Learning At Scale: System Design.” [7] Jakub Konecny, H. Brendan McMahan, Daniel Ramage, Peter Richtarik,“ Federated Optimization: Distributed Machine learning for On-Device Intelligence.” [8] Joel Stremmel, Arjun Singh,“ Pretraining Federated Text Models for Next Word Prediction.” [9] Yunlong Lu, Xiaohong Huang, Yueyue Dai, Sabita Maharjan and Yan Zhang,“ Federated Learning for Data Privacy Preservation in Vehicular Cyber-Physical Systems.” [10] Yiqiang Chen, Xin Qin, Jingong Wang, Chaouhui Yu, Wen Gao,“ FedHealth: A Federated Transfer Learning Framework for Wearable Healthcare.” [11] Mahfuzzur Rahman, Mahima Rabbi, Annajiat Alim Rasel, Md Tanzim Reza, “ A Design and Implementation of Bangla Next Word Predictor Based on Personalized Federated Learning Leveraging Model Agnostic Meta Learning and Semantic Analysis” [12] Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, Dusit Niyato and Chunyan Miao, “Federated Learning in Mobile Edge Networks: A Comprehensive Survey” [13] Kaiyue ZHANG, Xuan SONG, Chenhan ZHANG, Shui YU, “ Challenges and future directions of secure federated learning: a survey” [14] Sukanya Bag, “Federated Learning – A Beginners Guide” [15] George Jeno, “Federated Learning with Python: Design and Implement a Federated Learning System and Develop Applications using existing frameworks”

Copyright

Copyright © 2023 Khushi Udaysingh Chouhan, Nikita Pradeep Kumar Jha, Roshni Sanjay Jha, Shaikh Insha Kamaluddin, Dr. Nupur Giri. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET50826

Publish Date : 2023-04-22

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here