MaTouch Work With Open Ai & ChatGPT
by Lan_Makerfabs in Design > Software
197 Views, 2 Favorites, 0 Comments
MaTouch Work With Open Ai & ChatGPT
      The MaTouch AI ESP32S3 2.8" TFT ST7789V board integrate I2S voice input/I2S speaker/ 3 million camera OV3660/ 320*240 resolution display, with ESP32S3 strong processor& Wifi ability, to make this board a good tool/platform for AI development with ESP32.
Recently, we successfully connected the MaTouch AI 2.8" board to OpenAI, enabling real-time voice interaction. With just your voice, you can talk directly to the device — it listens, understands, thinks, and responds with natural speech output.
Supplies
Hardware:
- MaTouch AI ESP32S3 2.8" TFT ST7789V*1
 - Type-C USB Cable*1
 
Software:
- ESP-IDF Development Environment
 - OpenAI API Key
 
What Is Open Ai?
      OpenAI provides a powerful suite of AI models capable of understanding natural language, generating human-like responses, it integrates STT, TTS, and access to AI APIs. By integrating OpenAI’s API, developers can easily bring intelligent conversational abilities into embedded systems -- turning traditional hardware into truly “smart” devices.
How to Implement in MaTouch?
      The MaTouch AI board communicates with OpenAI through three major steps: Speech-to-Text (STT) -- AI model model (GPT) -- Text-to-Speech (TTS).
- Speech-to-Text (STT)
 
The user’s voice is recorded through the microphone and sent to OpenAI’s ’s STT model, which converts the audio into accurate text in real time.
- AI model Processing (GPT)
 
The recognized text is transmitted to OpenAI’s AI model(Such as GPT-3.5). The model understands the context and generates a response.
- Text-to-Speech (TTS)
 
The AI-generated text is sent to OpenAI’s TTS model, which produces a voice response. The MaTouch AI board then plays this voice output through the I2S speaker.
Set Up the ESP-IDF Development Environment
      Before you begin, please ensure that esp-idf is installed on your computer. If not, click Get Started with esp-idf to complete the installation.
Get Open AI API Keys
      
      
      
      
      
      
      
      
      - Sign in or register on the OpenAI platform.
 - Click Start building, fill in the relevant information.
 - Enter the project name and key name. You may also use the default.
 - Copy your key and click “Continue”.
 - Please ensure your account has sufficient funds; otherwise, the key will not function.
 - You can return to the overview page, click the Settings button, and view the API information.
 
How the Code Works?
ai_task() is the core function that implements the entire AI dialogue system, completing:
- 1.Speech-to-Text (STT)-- Sends audio recorded by the microphone to OpenAI to obtain text results.
 - Language Understanding and Generation-- Passes the recognized text to the GPT model to generate a brief response.
 - Text-to-Speech (TTS)-- Converts the GPT response back into speech and plays it aloud.
 
Key part of the code
- Create the OpenAI client.
 
Initializes the OpenAI client using your API key. This client handles all communication with the OpenAI cloud services.
- Create functional modules
 
Audio Transcription (STT) – Converts recorded speech into text.
Chat Completion (GPT) – Generates a response based on recognized text.
Audio Speech (TTS) – Converts the response text back into speech.
- Set module parameters
 
STT settings: Set the language to English, with an output stability of 0.2 (lower values indicate greater stability).
AI setting: Set the chat model to gpt-3.5-turbo and define it as an assistant.
TTS settings: Set the voice model to tts-1-hd and the voice type to alloy.
- Speech-to-Text (STT)
 
Sends the recorded audio buffer to OpenAI for transcription, receiving text output.
- Text-to-Response (ChatCompletion)
 
Passes the transcribed text to the GPT model, which generates a text response.
- Text-to-Speech (TTS)
 
Converts the GPT-generated response into speech and plays it through the speaker.
Upload the Code
      
      
      
      
      
      
      - Open the stt_llm_tts file by VS Code.
 - Paste the key you copied earlier from OpenAI into the code.
 - Set the target chip to ESP32S3.
 - Change your WiFi information.
 - Set Partition to “Custom partition table CSV”.
 - Set Flash size to 16MB.
 - Enable “Support for external, SPI-connect RAM” and set mode to “Octal Mode PSRAM”, finally click “Save”.
 - Use Type-C USB cable to connect the board and PC, select the corresponding port and Flash Device.
 
Result
      - Click “RECORD” to start a conversation;
 - Click “PLAY RECORD” to play back the recent recording.