Smart AI Glasses for Blind to Transcribe Text to Audio in Realtime Using Raspberry Pi Zero
by akhilnagori in Circuits > Raspberry Pi
3589 Views, 34 Favorites, 0 Comments
Smart AI Glasses for Blind to Transcribe Text to Audio in Realtime Using Raspberry Pi Zero
The Smart AI glasses have the ability to scan and read text aloud just by looking at the text. They are designed to help visually impaired individuals have more access to existing text.
I was inspired to pursue this project when I went to India. In the building where I stayed, there was a blind child who enjoyed listening to stories read to him by his parents. However, he couldn't read any stories by himself. Although he had access to a small number of braille books, there were many pieces of text that he needed help accessing. His story inspired me to create something that would enable him to access more pieces of text.
My design utilizes a Raspberry Pi Zero 2 W as the microcontroller, a spy camera for Raspberry Pi (to sense the text), two 1-watt 8-ohm mini speakers (to transcribe the text), and a custom soundboard using GPIO pins (to send the audio file to the speakers). It also uses 3D-printed parts for the glass frame to hold the components and a Lithium-Ion Polymer Battery—3.7v 1200mAh.
This project uses a trained model I made using Python and machine learning. The code will be explained further on in the tutorial.
Supplies
Materials used:
- Raspberry Pi Zero 2 W
- Zero Spy Camera for Raspberry Pi Zero
- Two mini speakers
- Any type of PLA Filament (I used Bambu Lab Filament)
- PCB Circuit Board
- 3.7 volt Lithium Ion Battery - preferably with more than 1.2 amps
- Jumper Wires - male to male
- Power Booster to make power supply sufficient for Raspberry Pi
Tools Needed:
- 3d Printer
- Solder Gun
Software
To originally start coding and making a software to convert text into audio, I used a Windows machine running a WSL sub-system of Ubuntu, so that it would be easier to port it to a Raspberry Pi Zero later, as both are Unix based. Inside of Visual Studio Code, I used a virtual environment to store all the the libraries and modules, along with the images used to test the OCR model. Because I was running a sub-system configuration, I could not use the camera or speakers of my computer, so alternatively, I used test images taken through my computer and uploaded onto the virtual environment. For the text-to-speech engine, I used pyttsx3, a simple and easy to use text to audio system, and converted the detected text to an mp3 file. Later, I ran the mp3 locally on my computer, and saw that it had been able to transcribe to audio.
Deploying to the Raspberry Pi:
Unless you are modifying the code, you will most likely just need to upload the code to the Raspberry Pi. To start off, enable SSH on the Raspberry Pi, so you can use the terminal directly through your other computer.
Steps:
- Create a virtual environment, using the command python3 -m venv <myvenvname>. Activate it with the command source /pathtoyourvenv/bin/activate
- Now that you are inside the virtual environment, begin by installing the doctr-ocr library module we are using for this project.(I used the pytorch model, but you can use the tensorflow version too) In the terminal, type, pip install python-doctor[torch]@git+https://github.com/mindee/doctr.git (must have git; run sudo apt install git)
- Also, install the pycairo pip library, which is a required dependency. (Also, install matplotlib, if not included in doctr)
Code:
After following steps above, you are now ready to run the code. Upload the main.py file and tts.py file to the Raspberry Pi. Then copy the code from tts.py to the end of main.py, so they run in one execution. Now, you should have a working text to audio glasses, but you must replace the test image in main.py to <imagename>.jpg. This will be used later when setting up the sound to the raspberry pi.
3d Model
These are the files you will need to print on a 3d printer. Use the software for your printer to slice this stl file.
The Sound
For the sound, we are using Bluetooth. Connect any pair of headphones to the glasses, by using the Bluetooth preferences on the raspberry pi, and choosing your device. Now, every time you turn on the headset, it should connect to the Raspberry Pi Zero! The text to speech engine will now play the sound.
Taking Pictures
In our project, we used open-cv to take pictures every ten seconds, but for accurate pictures of the text you are looking at, consider adding a pushbutton to the gpio pins, so that when clicked, they perform the action: raspistill -o <imagename>.jpg. This will capture the picture with the camera connected to the raspberry pi zero, and also do the text to audio on that specific image. Unfortunately, we did not have enough time to complete that step.
Summary
In conclusion, we successfully developed a pair of innovative glasses equipped with an advanced Optical Character Recognition (OCR) system that can scan and read text aloud. This project was inspired by the pressing need to enhance accessibility for individuals with visual impairments or reading difficulties. By leveraging Python's powerful libraries such as (Doctr), we created a solution that enables users to engage with the written word effortlessly. Our focus on optimizing the accuracy and speed of text recognition has resulted in a device that functions effectively in various lighting conditions and text formats, promoting independence and enhancing everyday experiences.