Survey of the Use of Multimodal Systems for Chess Game Analysis

Authors: Gavin W. Lampkin

DOI Link: https://doi.org/10.22214/ijraset.2024.65295

Abstract

Multimodal systems in general represent a convergence of technologies and methodologies to compute complex solutions and problems, incorporating computer vision, artificial intelligence, robotics, and neuroscience to enhance the understanding and interaction with the use of machinery in our case. This survey explores recent advancements in the field of chess multimodal systems, focusing on the integration of various modalities—such as optical and neural analysis as well as human-robot interaction to improve on chess understanding, game analysis, and player development. Key topics and additional research include the use of computer vision for real-time scrutiny of physical chess games, the use of association with brain regions in predicting chess expertise and novices’, and the development of chess-playing robots capable of interacting naturally with human opponents. Additionally, the survey highlights the disadvantages, benefits, and future research of applying these multimodal approaches in both industry and educational contexts. By integrating research from diverse avenues, this paper aims to provide an overarching overview of how multimodal systems are shaping the future of chess competitively and casually, offering new opportunities for cognitive research, automated gameplay, and human-robot collaboration.

Introduction

I. INTRODUCTION

Multimodal systems, a system that processes information from various avenues and in confluence with the output is able to make a prediction or an educated output. These specific types of systems were introduced and society has not looked back, with the introduction of multimodal biometrics, sentiment analysis, and language models like the infamous ChatGPT. I will be surveying papers regarding a subset of these multimodal systems, chess analysis and robots that interact with the chess board. Chess has been known as the “human limit” for intellectual over the board games for centuries, dating back to the 12th or 13th century being picked up by the Europeans. With the evolution of technology and the ability for machines to go to great lengths to beat the game of chess, we have entered a new era of chess, the machine age. I will be surveying researchers that have tackled the problem as a multimodal task where some of them do analysis over the board using machine learning and computer vision, and others try to break the norm of the peer to peer game and make it into a machine to peer match.

II. MULTIMODAL ROBOTS

A. Turk-2

[1] The Turk-2 is a multimodal system that is specialized to take advantage of this family of systems and apply them to the final-frontier of intellectual games, Chess. The robot takes advantage of using dual cameras, to process the board and for facial/posture classification of the opponent, alongside a microphone for speech recognition. These components are associated with a central controller that offers the internal chess engine for computation and the human-computer communication through an interactive user-interface. These components use a robot-arm to output their move to the board, as well as a “talking head” to output signals gained from the controller.

The Turk-2 is a sophisticated machine that takes in many inputs from various sources, for instance, the opponents face, voice, and body language is examined during the entirety of the game. These features show some superiority over its counterparts, however does allow for some drawbacks in its approach regarding training with so many input channels. The main disadvantage being that the machine is processing all of this information from the three avenues of input, and has to process and output in a timely manner to continue the game.

However, doing so in a systematic way that allows the controller to distinguish between these inputs alongside the origin and weights of such inputs, could be very beneficial in its move selection. The Turk-2 is an excellent instance of software, however, it could be simplified in how it accepts input and reduces the outputs while also allowing for the same complexity of computation.

Figure 2: [1]Structure of the Turk-2

Extending on the examination of the Turk-2 as a competent and fair chess opponent, since it accepts so many input channels and allows for the model to work within these parameters, you can gain much more information than a streamlined online chess machine. The information gained by the opponents face using the symmetrical cameras, for instance, is data that can be broadened to reach a far greater list of moves that could be positive for the machine. If the machine detects uncertainty before a move is made, that could be an indication that the opponent has made a mistake or a move that could be interpreted as a mistake, which could be a fault of the Turk-2.

B. Baxter Humanoid Robot

[2] The chess robot that was discussed within the Baxter Humanoid Robot analysis paper is much more complex than the prior solely based on its structure, adaptability, and multimodal input channels. The use of computer vision for perception of the board and its ever-changing environment during a game of chess, robotic control for actuation of the pieces in real time, and the chess logical decision making component that integrate computational algorithms for game-winning moves. On a high-level, these components interact to process the board and its pieces using two cameras, a robot arm to physically interact with the pieces which is controlled by an intelligent system that's optimized for chess.

[3] Figure 3: The Baxter Robot in action

The main advantages I found for the team's approach was their improved perception and interaction modulates for Baxter, as they use multiple cameras for perception and two interactive hands to actually make the moves in real-time. This allows for a more natural game of chess, rather than having to rely on an ambiguous game of chess, which is where many research papers lead, to games that don’t feel like chess anymore with the restrictions and constraints imposed on the game. The incorporation of multiple modalities overall allows for more complex task execution because the instructions and tasks can be distributed within the system to their own controllers, rather than having a centralized administrator for the system.

Some of the great things that the system handles are actually a drawback of it at the same time, including, the high computational load, increased system complexity with the addition of components, and cost and development time for the approach.

The computational load grows exponentially as modalities are added and can require significant processing power for the machine, while also slowing down the system's ability to compute the output for real-time performance. With the inclusion of each component, another layer of technical requirements are added to the development and maintenance list for the machine, potentially causing a hard time of reproduction. Lastly, since the machine is so complex, these systems need to be integrated, calibrated, and tested very specifically in a control environment, especially since there are arms that could cause bodily damage. This causes the cost and time to rise just based on the amount of resources needed to ensure the machine is ready to be tested before running an actual game.

III. MULTIMODAL CHESS ANALYSIS

A. Classification of Solving Chess Problems

[4] The next study uses a combination of eye tracking, posture detection and analysis, and facial expressions to make an assumption about the player solving challenging chess problems on a touch screen in a controlled setting. During their study, the 23 chess club voluntary participants, ranging from experts to intermediate players, were asked to sit 60 to 90 cm away from the screen, and avoid rapid or large head movements to ensure good results from the model. They were asked to solve a multitude of problems including openings, N-checkmate, and endgame puzzles to ensure a broad range of results for the researchers. The data gained focused on eye movement, to study the focus and engaged cognitive process of the participant, micro-emotion tracking to collect data about the cognitive state of a player, as well as the body posture of a user to detect nervousness, or bad feelings in an instant. The figure shown below shows the use of AOI (Areas of Interest) using eye tracking that the researchers use to determine where the user should be interested in on the board, for instance in the figure the AOI are two key pieces and one space where a piece could be moved.

[4] Figure 4: Three Areas of Interest (AOI)

The study does a very good job at correlating all of these factors in a way that allows the interpreter of such data to clearly see the results. For instance, regarding eye tracking it was seen that the expert players were spending much more time fixating on pieces that have intrinsic value to solving the puzzle, while the intermediates were spending longer to find those pieces. Going along with the examination of the data, it is seen that the users when faced with a very hard problem increase their expressions while solving, even if it is considered a “micro-expression”. An advantage of this study is the way that the users collected and the multimodal technologies used to gather and compare the data. For instance, the use of a Kinect 2.0 to do the posture and eye tracking data collection and processing is an innovative yet inexpensive way to complete the challenging feat of collecting this information All of this data, and more was gathered and examined to determine classification and analysis techniques for chess players and people solving touch problems.

Disadvantages from this study include the constraints put on the users during the problem solving task, as well as not moving their head rapidly, and the distance from the you must consistently be from the screen. This study would not be one that could be expanded to a larger set of participants because of the nuance of the study and the constraints imposed on the users for an extended period of time.

B. Integrating Vision and AI for Chess Game Monitoring

[5] This paper introduced a real-time tracking and analyzation technique for physical chess games using computer vision and machine learning with a focus on creating a fully automated system of observing and making decisions in real-time. The main component I will be discussing is the models ability to analyze the game and make educated decisions while the players make moves in real-time. One of the ways that the paper addresses difficulties in the research was the introduction of piece detection and analysis while not using a sensory-augmented board, relying on computer vision and multimodal cameras to do the detection.

[5] Figure 5: Process used to detect and process board

They achieved the accuracy desired while choosing not to use a sensory-augmented board by leveraging OpenCV, a open-source python library, to convert the real-time feed to grayscale, canny-edge detection, clustering intersections, among other techniques. This methodology is widely adopted by the machine learning chess community regarding doing board detection, but this paper details a method that is good and bad in its own rights. One of the disadvantages of the system is that four corners must be present in the camera's view to remain functional, based on their current process in edge-detection. Without these four corners the model becomes very confused and is at risk for race conditions. The model also finds trouble when the conditions are not ideal on the board, for example there are unexpected lines or too many objects, this can lead to unexpected results and inaccuracy in the machine’s calculations. The improvements suggested were using the Czyzewski algorithm for board analysis that digitizes the board and achieves a success rate of 99.5% when not in ideal conditions, which would be an enhancement in the current use.

The model does a great job at detecting piece movement, even using the players hands to do the piece movement detection and using that as an indicator as to when to make the analysis. There is a scan that is completed and does some error checking before making a decision, one of the advantages to this approach is that there are not so many bells and whistles, the multimodal side of it has to do with the camera and the central component. This approach doesn’t delve too deep into piece moving, but it is more concerned with doing the detection and analysis.

C. Chess Recognition with 3D Imaging

[6] This paper introduced a novel idea of processing the chess board using a three dimensional pattern illumination camera to track the state of the board. The team takes 3D image processing to their advantage by leveraging it to enhance clarity for the results in real-world environments in what could be used in real tournaments. The model uses a 3D component while also capturing images traditionally in 2D, which can strengthen the model to improve accuracy and have a fail-safe. This reduces error significantly by the ability to recognize pieces using a multimodal system of 2D and 3D components. Future work for this project includes using this model for other over the board games for detection and object tracking, this robust solution could be transitioned into different gaming environments. Some advantages of this model is that it allows for the dynamic ability to enhance the accuracy of piece recognition using 3D detection, disregarding many of the problems that arise by using 2D models. This helps especially when some parts of the board are different lighting levels and they still need to be processed, these conditions are not always perfect and the model is able to adapt to conform to the conditions present during recognition. The approach allows for the researchers to transition the project to an analysis and tracking tool, since it is, for now, just a recognition tool. If they decided to plug in a central controller to do some analysis on the game and produce some output, this could be a revolutionary chess analysis concept.

Disadvantages of this research would include the cost for this machine when compared to 2D models, the components needed to create a machine of this magnitude that can complete tasks in a timely manner are not cheap. The scope of this machine also is not very relevant to much beyond chess, if you wanted to expand on this multimodal machine into other avenues, even regarding other board games, it would serve as pretty challenging. The 3D calibration is also something to not take lightly, it takes an expert of the machine to calibrate it correctly for a simple chess game, if expansion into different games or markets is something that the research team is looking to do, that would be a challenging task to reproduce.

D. Neural Correlates of Expertise in Multimodal Chess Systems

[7] This study is unique to the rest in the way that it does its analysis of the games, it unravels the relationship between the players brain structure and their corresponding chess expertise by examining surface-based cortical features in multimodal association regions of the cerebrum. The paper adds a neurological dimension to multimodal chess by showing how brain structure and certain signals can influence expertise and characteristics we don’t even think about. The researchers aim to recognize how structural distinctions in brain areas can predict the expertise level of chess players.

[7] Figure 6: Increased gyrification index in (a) the posterior part of the right anterior cingulate cortex (a24pr, in red) and decreased in (b) the superior and posterior part of the right superior temporal sulcus (STSdp, in blue) predicted chess expertise using logistic regression. R, right.

The multimodal portion of this paper is sort of abstract, but it is also the most interesting in the way they approach the problem. The focus of the study was on multiple associations the brain makes that align with the big picture of modality and its use cases. They emphasize the role of integrating sensory and cognitive processing attempts in complex problem solving, including chess, but could be expanded to more scopes. The brain contains so many modalities that it allows for humans to dissect what is going on a deeper level and examine what is really going on when solving these complex problems.

The study employs a neurological imaging approach by focusing on cortical thickness and surface area measurements in regions of the brain associated with problem solving and deeper thinking. These skills include decision-making, memory, and spatial reasoning, just to name a few, these skills are some of the most essential when it comes to chess problem-solving. The research links structural differences in multimodal association regions with them, in particular studying thicker cortices and larger surface areas of the brain to get a better understanding of the model. With the use of cortical measures, the researchers are also able to make predictive models that successfully estimate the chess expertise of a participant just based on their brain imaging.

Future work could include using this model to transition into other games or methods of cognitive work, potentially including online games that are more interactive. Since the research gives so much insight into how a participants cognitive abilities are reflected in brain structure, this could be broadened to a larger scope of project with the ability to not just reference brain structure from games but everyday activities.

Conclusion

The use of multimodal systems in tandem with chess has expanded the way we consume, play, and understand the game we think of as the ultimate hierarchy of over the board intellectual games. By integrating the elements of computer vision, machine learning, robot and human interaction, and even cognitive analysis, we have been able to understand how the best of the best and even novice players digest the board and the game in general. The research introduced demonstrates that the use of multimodal systems to do this analysis of the game is a viable option and should be used in the future of the game for all skill levels. However, there is still work to be done with the optimization of these systems, especially regarding integrating them into real-world scenarios, potentially at tournaments, managing the complexity of the inputs and simplifying the outputs, and making the technology available to a broader audience than just researchers. The future of the research looks promising, but the need for algorithm refining and enhancing human-robot interaction is something that most of the papers discussed as a goal for the future of this work. In conclusion, the research introduced represents a promising start to the study of chess and the use of multimodal systems in the search of “solving” chess. With the continuation of building on this research and multimodal approaches in general, not just for chess, we can create robust solutions to some of the toughest machine learning and real-world problems we face today. Multimodal systems in general allow for humans and computers to work together to create solutions that we could have never imagined even 50 years ago, but now, we have machines beating the best of the best humans in chess, the final frontier of over the board games!

References

[1] L. Sajo, G. Kovacs and A. Fazekas, \"An application of multi-modal human-computer interaction — the chess player turk 2,\" 2008 IEEE International Conference on Automation, Quality and Testing, Robotics, Cluj-Napoca, Romania, 2008, pp. 316-319, doi: 10.1109/AQTR.2008.4588846. [2] A. T. -Y. Chen and K. I. -K. Wang, \"Computer vision based chess playing capabilities for the Baxter humanoid robot,\" 2016 2nd International Conference on Control, Automation and Robotics (ICCAR), Hong Kong, China, 2016, pp. 11-14, doi: 10.1109/ICCAR.2016.7486689. [3] E. Mukhamedov, \"Chesska defends world champion title in robot chess,\" Chess News, May 23, 2012. [4] Guntz T, Balzarini R, Vaufreydaz D, Crowley J. Multimodal Observation and Classification of People Engaged in Problem Solving: Application to Chess Players. Multimodal Technologies and Interaction. 2018; 2(2):11. https://doi.org/10.3390/mti2020011 [5] Bugarin, A. I. (2024). REAL TIME TRACKING AND ANALYSIS OF PHYSICAL CHESS GAMES USING COMPUTER VISION AND MACHINE LEARNING. UC Riverside: University Honors. Retrieved from https://escholarship.org/uc/item/31d7k315 [6] Lars Brunner, Mario Salvator, Philipp Roebrock, and Udo J. Birk \"Chess recognition using 3D patterned illumination camera\", Proc. SPIE 11605, Thirteenth International Conference on Machine Vision, 116051T (4 January 2021); https://doi.org/10.1117/12.2587054 [7] Trevisan, N.; Jaillard, A.; Cattarinussi, G.; De Roni, P.; Sambataro, F. Surface-Based Cortical Measures in Multimodal Association Brain Regions Predict Chess Expertise. Brain Sci. 2022, 12, 1592. https://doi.org/10.3390/ brainsci12111592

Copyright

Copyright © 2024 Gavin W. Lampkin. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET65295

Publish Date : 2024-11-15

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here