
Digital images are essential to communication, but pose accessibility challenges to people with visual impairments (VIPs). Image recognition (IR) technology bridges the divide by converting visuals into VIP-friendly formats like audio or text. Unfortunately, existing algorithms and artificial intelligence (AI) in IR systems are below par, especially when images are complex. There is a serious mismatch with user needs.
The present authors address the issues by surveying the correlation between (a) complex images vital to average citizens and (b) the spectrum of visual impairments, from color blindness to low vision to total blindness. They studied International Organization for Standardization (ISO) 9241-110:2006 (Ergonomics of Human-System Interaction, Part 110: Interaction Principles), which specified the general principles for effective user-interface design, namely, suitability for the task, self-descriptiveness, controllability, conformity with expectations, error tolerance, suitability for individualization, and suitability for learning. They surveyed firsthand experiences from various categories of VIPs, including their expertise, confidence, accuracy, desired features, and criticism on IR tools.
They recommend that future IR designs for VIPs should align with the user-focused ISO Standard, with usability testing by front-end users, robust user interface, and potential individual customization. They suggest improved AI for object detection, real-time recognition, natural language processing (NLP), and contextual understanding.
Overall, the paper is interesting and readable, and provides a basic overview of IR systems for VIPs. I recommend it to users with visual impairments who are new to IR and hence need a brief rundown of popular tools.
Having said that, as a researcher in requirements engineering and the unique needs of people with disabilities, I have some notes for future studies.
The present survey is not sufficient to fulfill the objective of establishing viable guidelines to IR tool designers for users with visual impairment. The number of participating users (16) is severely limited. The statistical significance is in doubt. The qualitative responses are essentially predictable by rehabilitation practitioners. Replies like “there is potential for improvement [of IR tools] in terms of precision, integration and functionalities” are hardly useful to designers.
In addition, we should not conduct research based simply on one published standard. It is particularly inappropriate to use ISO 9241-110:2006 when ISO 9241-110:2020 is already available. Instead, we should critically examine real-life practices to look for deficiencies in current designs.
Finally, the authors still use stigmatizing labels like “the blind” and “the visually impaired.” Blindness or visual impairment does not represent users holistically. They have many other attributes, such as educational background and computer experience.