Skill Processing
Glossaryâ
| Term | Description |
|---|---|
| ASR | Automatic Speech Recognition, a technology that converts a user's speech input into text. |
| NLG | Natural Language Generation, a technology that converts structured data or intent into natural-language text. |
| Skill | A capability command returned by the AI model, including emotion expression, music playback, story playback, playback control, and more. |
| Cloud event | An event proactively pushed by the cloud to control device behavior, such as TTS playback. |
Overviewâ
ai_skill is the text-processing component in the TuyaOpen AI application framework. It handles multiple text payloads from ai_agent, including ASR results, NLG-generated text, skill commands, and cloud events. Based on the text type, the module dispatches processing, then triggers corresponding user events or executes related actions.
Core Capabilitiesâ
- ASR processing: Processes speech recognition results and publishes ASR events to the application layer.
- NLG processing: Processes natural-language text streams and supports streaming text output.
- Skill processing: Parses and executes skill instructions, including emotion skills, music/story skills, and playback-control skills.
- Cloud event processing: Processes cloud-pushed events such as TTS playback commands.
- Event notification: Notifies the application layer of text-processing results through the event system.
Workflowâ
Module Architectureâ
Text Processing Flowâ
After ai_agent receives text data, it dispatches the payload to the corresponding handler based on the text type.
ASR Processing Flowâ
Processes speech recognition results and publishes corresponding events based on whether recognized content is empty.
NLG Processing Flowâ
Processes natural-language text streams and supports both streaming text output and image output.
Skill Processing Flowâ
Parses skill codes and performs corresponding operations based on the skill type.
Dependency Componentsâ
- Audio component (
ENABLE_COMP_AI_AUDIO): Optional; required for music/story skills and playback-control skills.
Skill Module Detailsâ
The ai_skill module includes the following submodules for different skill and event types:
Emotionâ
An emotion-skill processing module that parses and handles emotion-expression instructions returned by the AI model.
- Function: Parses emotion-skill JSON payloads, extracts emotion tags and emojis, and publishes emotion events to the application layer.
- Supported emotion types: Includes neutral, happy, laughing, sad, angry, fear, love, awkward, surprised, shocked, thinking, wink, cool, relaxed, delicious, kiss, confident, sleepy, silly, confused, and more.
Music/Storyâ
A music/story skill-processing module responsible for parsing and playing music or story content.
- Function: Parses music/story skill JSON payloads, builds playlists, and invokes the audio player for playback.
- Supported operations: Play, pause, resume, stop, previous, next, replay, single-track loop, ordered loop, and other playback controls.
Cloud event processingâ
A cloud-event processing module responsible for handling event commands proactively pushed by the cloud.
- Function: Parses cloud-event JSON data and processes TTS playback instructions (
playTtsandalert). - Supported event types: TTS playback (
playTts) and prompt-tone playback (alert). - Features: Supports TTS URL playback, background-music playback, and multiple audio formats (MP3, WAV, SPEEX, OPUS, OGGOPUS).
Development Flowâ
Data Structuresâ
Text Typesâ
typedef uint8_t AI_TEXT_TYPE_E;
#define AI_TEXT_ASR 0x00 // ASR text
#define AI_TEXT_NLG 0x01 // NLG text
#define AI_TEXT_SKILL 0x02 // Skill data
#define AI_TEXT_OTHER 0x03 // Other text
#define AI_TEXT_CLOUD_EVENT 0x04 // Cloud event
Text Notification Structureâ
typedef struct {
char *data; // Text data
uint32_t datalen; // Data length
uint32_t timeindex; // Time index
} AI_NOTIFY_TEXT_T;
API Descriptionâ
Process Text Dataâ
Processes text data from ai_agent by text type.
/**
* @brief Process AI text data based on type
* @param type Text type (ASR, NLG, SKILL, CLOUD_EVENT)
* @param root JSON root object containing text data
* @param eof End of file flag indicating if this is the last data chunk
* @return OPERATE_RET Operation result code
*/
OPERATE_RET ai_text_process(AI_TEXT_TYPE_E type, cJSON *root, bool eof);
Development Stepsâ
- Ensure dependent components are initialized: If music/story skills are enabled, ensure the audio player is initialized.
- Register text callbacks: During
ai_agentinitialization, ensure text callbacks are properly registered. - Handle events: Subscribe to corresponding user events (ASR, NLG, SKILL, etc.) in the application layer to process text-handling results.
Reference Examplesâ
Handle ASR Resultsâ
#include "ai_user_event.h"
// Subscribe to ASR events
void handle_asr_event(AI_NOTIFY_EVENT_T *event)
{
if (event->type == AI_USER_EVT_ASR_OK) {
AI_NOTIFY_TEXT_T *text = (AI_NOTIFY_TEXT_T *)event->data;
PR_NOTICE("ASR recognition result: %s", text->data);
} else if (event->type == AI_USER_EVT_ASR_EMPTY) {
PR_NOTICE("ASR recognition result is empty");
}
}
Handle NLG Text Streamâ
// Subscribe to NLG text-stream events
void handle_nlg_stream(AI_NOTIFY_EVENT_T *event)
{
AI_NOTIFY_TEXT_T *text = (AI_NOTIFY_TEXT_T *)event->data;
switch (event->type) {
case AI_USER_EVT_TEXT_STREAM_START:
PR_NOTICE("NLG text stream started");
break;
case AI_USER_EVT_TEXT_STREAM_DATA:
PR_NOTICE("NLG text data: %s", text->data);
break;
case AI_USER_EVT_TEXT_STREAM_STOP:
PR_NOTICE("NLG text stream ended");
break;
default:
break;
}
}
Handle Skill Eventsâ
// Subscribe to skill events
void handle_skill_event(AI_NOTIFY_EVENT_T *event)
{
if (event->type == AI_USER_EVT_SKILL) {
cJSON *skill_data = (cJSON *)event->data;
PR_NOTICE("Received skill data");
// Parse and handle custom skills
// ...
}
}