New apps for visually impaired users provide virtual labels for controls and a way to explore images
With VizLens, users can touch buttons while their phones read out the labels, and Image Explorer provides a workaround for bad or missing alt text.
With VizLens, users can touch buttons while their phones read out the labels, and Image Explorer provides a workaround for bad or missing alt text.
Visually impaired iPhone users have two new free tools at their disposal, developed by a team now based at the University of Michigan. One can read the labels on control panels while the other identifies features in an image so that users can explore it through touch and audio feedback.
VizLens is essentially a screen reader that can function in the real world. It reads labels at the direction of the user, who points with their fingers at buttons of interest on control panels. With it, users can employ their smartphone cameras to understand and operate a variety of interfaces in their everyday environments, including home appliances and public kiosks.
“A blind user can take a picture of an interface, and we use optical character recognition to automatically detect the text labels. A user can first familiarize themself with the layout on their smartphone touchscreen. Then, they can move their finger on the physical appliance control panel, and the app will speak out the button under the user’s finger,” said Anhong Guo, U-M assistant professor of computer science and engineering, who led the development of both apps.
The second app, ImageExplorer, helps visually impaired individuals better understand the content of images. For this purpose, Guo and his team have integrated a suite of object detection and segmentation models—including Meta’s Detectron2 visual recognition library and Google OCR (optical character recognition) and image analysis models—to enable visually impaired users explore what is in the image and how the different objects relate to one another.
Guo’s aim is to offer visually impaired people agency when alt text is missing or incomplete, as AI-generated captions are often not sufficient.
“There are a number of automated caption programs out there that blind people use to understand images, but they often have errors, and it’s impossible for users to debug them because they can’t see the images,” Guo said. “Our goal, then, was to stitch together a bunch of AI tools to give users the ability to explore images in more detail with a greater degree of agency.”
Upon uploading an image, ImageExplorer provides a thorough analysis of the image’s content. It gives a general overview of the image, including the objects detected, relevant tags and a caption. The app also features a touch-based interface that allows users to explore the spatial layout and content of the image by pointing to different areas.
ImageExplorer is unique in the level of detail it provides. It gives users a comprehensive description of the objects in an image, down to the level of what type of clothing a person is wearing and what activities they are engaged in, as well as the position of these objects in the image.
“ImageExplorer helps users understand the content of an image even though they cannot see it,” Guo said.
Hundreds of visually impaired, user-testing participants have experimented with VizLens and ImageExplorer, offering feedback to Guo’s team, which is continuing to develop these tools. First discussed in 2022, ImageExplorer is a much newer concept than VizLens, which made its academic debut in 2016. Some of its details need further refinement—for instance, most tops are simplified to “shirts,” and different tools within ImageExplorer sometimes give conflicting information.
“The accuracy relies on the models we use, and as they improve, ImageExplorer will improve,” Guo said. “In spite of these errors, the results we presented in 2022 show that ImageExplorer enables users to make more informed judgements of the accuracy of the AI-generated captions.”
Guo is also looking forward to the feedback that will come with public deployment.
“We will be able to observe how people use these tools and adapt them to their lives,” he said.
The research is funded by the University of Michigan with additional support from Google.
Written by Emily France.