Recently, HLP lab was given the opportunity to evaluate the ASL MobileEye wearable eye tracking system and its accompanying analysis software, GazeMap. ASL generously offered to lend us the system for a month so that we could evaluate it in a set of short experiments. We attempted to put together simplified and shortened versions of four different experimental paradigms. In choosing these, we sought a variety that would tax different aspects of both the hardware and the analysis software.
After finishing the trial period, we compiled our feedback and sent it to ASL. They promptly responded with answers to our concerns and proposed solutions to many of the points that we made (see below). Since then, they have confirmed to us that their newest version includes a number of changes that directly addressed our comments.
This post contains a description of our testing methods, the results, and the feedback we sent back to ASL, modified only to reflect the changes that they claim to have made in their newest version. The primary goal of the tests we conducted was to determine what GazeMap and the MobileEye can do that might be of interest to psycholinguists. We’re particularly excited about Experiment 4 below.
MobileEye and GazeMap
The MobileEye consists of a small eye and scene camera module that attaches to the wearable headgear. The module could be adjusted using a tiny screwdriver to manipulate the angle and position of the eye camera after being mounted to the headgear. The kit we received contained this module along with a pair of standard plastic goggles and additionally a pair of large plastic frames without lenses, which was meant to better accommodate subjects with glasses. Each headgear uses an adjustable, clear plastic monocle that is mounted at an angle to reflect the IR image of the eye into the camera.
The wearable unit is attached by coaxial cable to a small recording device, which is actually a modified off-the-shelf DV recorder/player that utilizes standard DV tapes. A firewire interface allows recording and calibration to be done online through a connected PC. The standard recording software that we were given was the ASL Eye Vision program. This was generally used to adjust the eye images and calibrate them to the scene camera. An offline recording option was also possible by recording the video onto the DV tapes for later recording with Eye Vision.
GazeMap is perhaps the most unique feature in the system that we evaluated. It is an analysis program that effectively allows for automatic coding of regions of interest on a flat plane in a natural scene. In essence, it constructs a simplified model of the scene environment using a Simultaneous Localization and Mapping algorithm and superimposes it onto the video so that point-of-regard can be mapped onto geometrical regions of interest as defined within the model. While somewhat limited in the conditions and types of environments that can be used, the ability to go from raw point-of-regard data to fully coded data without human intervention is of obvious convenience.
GazeMap requires users to record a short video of the environment that will contain the areas of interest. In order to establish a reference for ground truth scale, a printout of a black box of known size and proportions must be shown in the center of the first frames of this map creation video. Once the box is detected and measured, the software then analyzes the rest of video and detects distinct feature points in the environment and it uses this data to construct its model of the environment.
To arrive at usable data, it was necessary to run their files through a statistical viewer program. This step involved loading in their data files individually and converting them into CSV files containing the point-of-regard data and region encoding.
Each of the following experiments was run on two different subjects and feedback was collected both from the experimenters and the subjects themselves. The pilot experiments presented below were conducted by Dan Pontillo with the help of Andrew Watts (HLP/Jaeger lab manager), Dana Subik (MTanLab manager), and Caitlin Hilliard (former HLP RA).
1 ) Magnet Board Experiment
Subjects were seated in front of a magnet board that was bisected into two different colored sides. Small moveable magnet images of a boy and a girl were placed on opposite sides of the board. In each of 16 trials, several different objects were also placed on each side. The subject was asked to make the board reflect an auditory command that was played for them. The verbal commands were a short descriptive sentence followed by a direct command such as “give ___ a cup of coffee”, after which the subject would move the coffee magnet onto the boy or girl magnet, depending on whether the name was masculine or feminine.
MobileEye and GazeMap were mainly successful within this paradigm. Calibration and data collection were fine aside from the calibration issues caused by subjects with glasses. It was difficult to create a good environment map due to the relative sparseness of the magnet boards themselves. We surrounded the edges with pieces of electrical tape and put three small black magnets in the center of the board to help the software localize, but it still required a very careful maneuvering of the camera during the map creation. The software was relatively successful during the automatic coding, but there were videos in which it completely failed to localize, again probably due to the sparseness of the board. It’s clear that experiments must be designed with very clear distinctive markers around areas of interest if the intention is to use GazeMap for analysis.
2 ) Card Experiment
We attempted to replicate a simple version of the Eberhard, Spivey-Knowlton, Sedivy & Tanenhaus 1995 experiment in which subjects were seated in front of a well-defined 5×5 grid array of playing cards and empty spaces. In each trial, there were two instances of a five of hearts, positioned relative to unique cards so that one can unambiguously specify each of the two. Subjects are asked first to look at each card on the board, then to move an arbitrarily selected unique card to a specific place on the board, relative to other cards. Finally, they are asked to move a specific instance of the five of hearts to a different place on the board. There are ten trials in which the board is rearranged and the commands are shifted, and there are several “dummy” trials in which the five of hearts are not the cards of interest.
This experiment was the most successful of all those we attempted. GazeMap consistently had no problem localizing on the grid, and the AOIs that we drew were generally very accurate. For this type of paradigm, there is no question that GazeMap is highly effective. The analysis easily results in a neat region-coded point-of-regard dataset with no human intervention
3 ) Computer Based Visual World Study
Subjects were seated at a computer with a CRT monitor and asked to move small icons of different objects between spaces on a grid. Commands were phrased so that there was only one correct place to move the specified icon, but the other icons presented were selected to produce varying levels of ambiguity during the perception of the verbal command.
The technical challenges here were due to the flickering of the CRT and the lighting conditions of the test room. The testing room was illuminated by a single incandescent bulb and the ambient light from the computer screen. The use of a CRT monitor is also a circumstance of the room setup, and it is probably not the optimal choice for the Mobile Eye. We were able to use GazeMap successfully after several failed initial attempts at making an environment map, but the calibration step for subjects was much more difficult than in other conditions.
4) Gesture study
A subject and an administrator/interlocutor were seated across from one another at a small table. On the subject’s side there is a grid of objects, some of which are doubles. There is a small divider piece in the center of the table for later use as a fiducial marker in GazeMap. The interlocutor gives the subject a series of commands to move objects around the grid in various ways. The objects with doubles are specified only using pointing gestures. Subjects need to see the gestures in order to disambiguate the objects referred to in the commands.
There were no major issues in calibration and data collection, though one subject couldn’t be used because the for-glasses headset’s frames were too big vertically for correct eye positioning. This was considered an experimental boundary test for GazeMap, as the area of interest in this study was in fact a region of space ranging from the fiducial marker on the table to the wall behind and above the interlocutor. This formed an invisible slanted plane that we roughly assume to indicate a fixation on the interlocutor and specifically the disambiguating gesture.
We found that GazeMap actually was able to overlay this invisible AOI on the scene fairly reliably. This is very good news for us, since it means that we can run similar studies in which the AOI is not a solid physical plane, provided that the region of interest is formed by known points in the scene.
Feedback and Issues
While overall our experience with the MobileEye and GazeMap was positive, there were several specific issues that we sent as feedback to ASL. Their responses were very helpful, and many of the issues we initially found were addressed
1 ) During our testing, there was a default-unselected checkbox that, when left unchanged, caused the software to not produce data files.
ASL responded by saying that this issue has since been addressed.
2 ) The stat software was somewhat buggy. There were entire sets of output
files which, when loaded into the “view stastics” gui, immediately crashed the entirety of GazeMap. The requirement of using the GazeMap stat viewing gui in order to arrive at the output csv file was inconvenient added step, and it was also troubling that there was no processing mechanism for directly transitioning from raw unreadable data to usable CSV data. This meant that a considerable amount of mindless scriptable work was necessary before arriving at usable data. While there was a batch processing option for video mapping, there was no such option for processing of the output data into readable files. A large percentage of the statistics files that we produced from some experiments crashed GazeMap when we tried to view them, and thus all of that data was completely unusable since there was no other way to directly arrive at a readable data format.
ASL has since responded by saying that they have fixed the bug that caused this crashing problem, and that they intend to included batch processing of the data in the future, so much of this issue should be resolved.
3 ) Sometimes GazeMap simply crashed while running the batch video mapping process. We would leave to do something else and find that it had only gone through half of the files.
We requested that ASL attempt to handle crashes so that individual video errors could be bypassed and the software could proceed to the next video in the batch. They responded by saying that this problem would be taken care of in a future release.
4 ) We noted in our feedback that it would be convenient to have an offline calibration option and the ability to pause the video during the scene calibration step. This would allow the experimenter to more easily draw calibration points in the video if the subject was having a hard time fixating for a long enough time.
ASL responded by noting that this was indeed possible when recording onto the DV tapes and calibrating after the fact. A frame pause for calibration during online recording remains impossible.
5 ) We once encountered a bug in which Eye Vision only showed the scene video, even when the pupil detection settings panes were selected. It was only fixed by restarting Eye Vision. It wasn’t a big deal, but it was confusing and caused a delay in running a subject.
6 ) We would have liked to have the ability to skip individual files while running the batch coding system. Having a simple “skip this file” button would have saved time during analysis. During analysis we would notice on some videos that the tracking was completely failing (around 95% lost frames in GazeMap). It would be better to simply skip those videos during analysis rather than processing them.
In their response ASL noted that canceling a video process would indeed cause the system to move on to the next video in the batch, which was not apparent to us at the time. They also mentioned that they were working on improvements that would skip frames during periods in which the mapped environment was lost in order to speed up the processing.
7 ) It would potentially speed up the analysis process if there was an option to not display the frames as they were being mapped, or even to not produce the output video. Though it does look cool, if you are batch processing, there is often no need to have the video displayed or even created at all, as you are only interested in the raw data.
ASL said that they have included such display customization as checkbox options in their newest version.
8 ) A large portion of our data was not usable within GazeMap because of issues like the subject’s close proximity to the testing grids, a lack of unique fiducial markers in our scenes, and the simple limitations of GazeMap in detecting points. It would possibly be useful to have a manual correspondance point selection option in GazeMap. Wide angle lenses for the scene camera would also help us design interactive experiments with more room for fiducial points.
ASL agreed that this was an issue though they said addressing it would be a long-term project.
9 ) We would have liked more options for manually adjusting image contrast and other settings for lighting conditions in which the image is washed out and there is difficulty finding quality feature points.
10 ) We would like the ability to place precisely synched audio or signal-based markers in at specific timestamps in the data files, as this would be very useful for certain temporally sensitive psycholinguistics paradigms.
11 ) We would have liked if the software could output some kind of measurement in its data file of whether the subject is “in a fixation” or not for each encoded point-of-regard, even if it is not a fixation in an AOI. Since we were told that a saccade or smooth pursuit glide over an AOI is not counted or encoded and that the fixation detection method is embedded inside GazeMap, it wouldn’t be hard to show fixation durations, start times and end times in the output data.
12 ) We would have preferred less of an emphasis on the summary statistics, and more of an emphasis on the individual data points. This relates back to an easier route from the video to a straightforward raw data file. We will be developing outside statistics scripts with things like R and Matlab anyway, so it is best for us to have the finest possible grain of detail as an output from GazeMap without pushing through other software.
13 ) It would be interesting to have the 3D POR estimates in the output file from GazeMap. It is potentially useful to have a metric of distance from the subject to the POR.
Other Hardware Issues
1 ) We found it consistently difficult to deal with subjects with glasses. Only one out of several glasses wearers was calibrated successfully while wearing glasses. Reflections and the angle required rendered the pupil detector almost always ineffective.
2 ) The camera needs a wider angle lens that is calibrated to work with GazeMap. It would be ideal if this were easily interchangable with the normal lens and the GazeMap software allowed you to select which camera calibration matrix to use on a per video basis.
3 ) The headseat is relatively heavy (compared to other devices) and the weight is relatively unevenly distributed, according to subject feedback. We also found that the headset for glasses-wearers was too large for some people with small faces. It may be better to remove the bottom part of the frames.
4 ) It was very difficult to focus the camera while it was being worn. We found this necessary in most cases to arrive at a clear eye image, and reaching the dial for focus was often problematic.
We hope that some of this is useful to folks out there. Please feel free to post questions and comments below and we’ll try out best to react to them.
Since this post was written ASL has made several improvements to the software, which are posted as a comment below. Please read on.