
This paper offers a timely and comprehensive overview of assistive image recognition technologies (IRTs), focusing on the inclusion of blind or low vision (BLV) users. It contextualizes the topic within broader digital inclusion goals, referencing United Nations (UN) Sustainable Development Goals (SDGs)--notably SDG 9--and ISO standards on accessible technology, such as braille displays and audio output.
Section 2 maps the features of 21 IRTs (for example, Seeing AI, Envision, Supersense), highlighting object recognition, text reading, currency detection, and auditory feedback. Table 1 details tool coverage, while platforms like Be My Eyes--integrated with GPT-4V--stand out for their community-driven approach. The paper emphasizes AI-powered multimodal systems (for example, OpenAI’s GPT-4V) that enable text-to-speech, spoken commands, and contextual feedback.
The paper discusses preprocessing (for example, contrast enhancement and noise reduction) and deep learning models--You Only Look Once (YOLO), convolutional neural networks (CNNs), long short-term memory (LSTM), and recurrent neural networks (RNNs)--for real-time object detection. Mobile apps use OpenCV, Ultralytics, and pyttsx3 to convert visual data into speech [1]. Others integrate VGG16 with gated recurrent units (GRUs)/LSTM for caption generation [2], or propose edge-AI solutions with low-cost hardware like Intel’s Myriad X and advanced driver assistance systems (ADAS) [3].
Section 3 presents survey data from ten BLV participants, exploring usability, satisfaction, and areas needing improvement (for example, multilingual support, high costs, and software stability). ISO ergonomics principles are considered in relation to user-centered human–computer interaction (HCI).
Overall, this is a well-researched, technically sound, and interdisciplinary contribution to accessible computing.