Skip to main content

Agent

Glossaryโ€‹

TermDescription
AgentAn AI entity that can perceive, think, make decisions, and act independently.
ASRAutomatic Speech Recognition (Automatic Speech Recognition) is a technology that converts the user's voice input into text.
NLGNatural Language Generation (Natural Language Generation) is a technology that converts structured data or intentions into natural language text.
SkillSkill/ability module, an independent, pluggable, AI functional unit that specializes in doing something.

Overviewโ€‹

ai_agent is a core component in the TuyaOpen AI application framework. It communicates with Tuya AI cloud services. As a middleware layer, it connects local applications to cloud AI services for intelligent dialogue, speech recognition, and natural language understanding.

Multimodal data inputโ€‹

  • Audio input: Supports a variety of audio codec formats

  • PCM: Uncompressed raw audio format, suitable for local processing

  • OPUS: efficient audio codec format, suitable for network transmission, supports low latency

  • SPEEX: Speech-optimized codec format, suitable for voice communications

  • Text Input: Supports sending text commands or queries directly to the cloud

  • Image input: Supports uploading image data to the cloud for image recognition and analysis, suitable for visual question answering, image understanding and other scenarios

  • File Input: Supports uploading file data to the cloud, suitable for document processing, file analysis and other scenarios

Output processingโ€‹

  • Text callback: Processes text payloads such as ASR, NLG, and skill data.

  • Media data callback: Processes media streams such as audio, video, image, and file data.

  • Media property callback: Provides metadata such as audio codec type.

AI session event managementโ€‹

The module manages the full AI dialogue lifecycle and notifies the application layer through the event callback mechanism:

  • Session Start Event: Triggered when the cloud starts returning data, usually used to start the TTS player and prepare to receive audio stream data.
  • Session end event: Triggered when the cloud data transmission is completed, used to stop the TTS player and complete the playback process.
  • Session Interruption Event: Triggered when the cloud actively interrupts the conversation, and the current playback needs to be stopped immediately and resources cleared. Common scenarios include user interruption, cloud timeout, etc.
  • Session Exit Event: Triggered when the conversation exits completely, used to clean up all related resources.
  • Server VAD event: Cloud voice activity detection event, used to notify the application layer of the detected voice activity status in the cloud.

Cloud prompt tone managementโ€‹

  • Request cloud prompt sound: Generates the corresponding prompt token (cmd:0 to cmd:5) based on the prompt type. The AI then returns the corresponding prompt audio. This requires agent configuration on the platform and explicit Prompt responses for cmd:0 to cmd:5.

  • Play cloud prompts: After receiving the audio data returned by the cloud, call the player interface to play.

  • Beep tone mapping table:

Alarm typePrompt wordDescription
AT_NETWORK_CONNECTEDcmd:0Network connection successful
AT_WAKEUPcmd:1Wake up response
AT_LONG_KEY_TALKcmd:2Long press key to talk
AT_KEY_TALKcmd:3Key to talk
AT_WAKEUP_TALKcmd:4Wake Up Talk
AT_RANDOM_TALKcmd:5Random conversation

Agent role switchingโ€‹

The module supports dynamic switching of AI Agent roles. Different roles can have different conversation styles, knowledge bases and skill sets, suitable for multi-scenario applications.

Workflowโ€‹

Initializationโ€‹

Input processingโ€‹

Output processingโ€‹

Session event managementโ€‹

Cloud notification soundโ€‹

Callback function diagramโ€‹

Development processโ€‹

Interface descriptionโ€‹

Initializationโ€‹

Initialize the AI Agent module. If ENABLE_AI_MONITOR is enabled, the monitoring module is also initialized for debugging with tyutool.

This initialization must be called after the MQTT connection is successful

/**
@brief Initialize the AI agent module
@return OPERATE_RET Operation result
*/
OPERATE_RET ai_agent_init(void);

Deinitializationโ€‹

Release the resources occupied by the AI โ€‹โ€‹Agent module

/**
@brief Deinitialize the AI agent module
@return OPERATE_RET Operation result
*/
OPERATE_RET ai_agent_deinit(void);

Enter textโ€‹

Send text data to AI

/**
@brief Send text input to AI agent
@param content Text content to send
@return OPERATE_RET Operation result
*/
OPERATE_RET ai_agent_send_text(char *content);

Input fileโ€‹

Send file data to AI

/**
@brief Send file data to AI agent
@param data Pointer to file data
@param len File data length
@return OPERATE_RET Operation result
*/
OPERATE_RET ai_agent_send_file(uint8_t *data, uint32_t len);

Enter imageโ€‹

Send image data to AI

/**
@brief Send image data to AI agent
@param data Pointer to image data
@param len Image data length
@return OPERATE_RET Operation result
*/
OPERATE_RET ai_agent_send_image(uint8_t *data, uint32_t len);

Play cloud prompt soundโ€‹

Generate prompt tokens based on prompt sound type, then use those tokens to request AI-generated prompt audio for playback.

/**
@brief Request cloud alert from AI agent
@param type Alert type
@return OPERATE_RET Operation result
*/
OPERATE_RET ai_agent_cloud_alert(AI_ALERT_TYPE_E type);

Switch agent rolesโ€‹

/**
@brief Switch AI agent role
@param role Role name to switch to
@return OPERATE_RET Operation result
*/
OPERATE_RET ai_agent_role_switch(char *role);

Development stepsโ€‹

Reference codeโ€‹

// MQTT connection event callback
int __ai_mqtt_connected_evt(void *data)
{
if (!sg_ai_agent_inited) {
// Step 3: Initialize AI Agent module
TUYA_CALL_ERR_LOG(ai_agent_init());
sg_ai_agent_inited = true;
}
return OPRT_OK;
}

// initialization function
OPERATE_RET example_init(void)
{
OPERATE_RET rt = OPRT_OK;

//Initialize the audio input and playback module
#if defined(ENABLE_COMP_AI_AUDIO) && (ENABLE_COMP_AI_AUDIO == 1)
AI_AUDIO_INPUT_CFG_T input_cfg = {
.vad_mode = AI_AUDIO_VAD_MANUAL,
.vad_off_ms = 1000,
.vad_active_ms = 200,
.slice_ms = 80,
.output_cb = __ai_audio_output,
};
TUYA_CALL_ERR_RETURN(ai_audio_input_init(&input_cfg));
TUYA_CALL_ERR_RETURN(ai_audio_player_init());
#endif

// Subscribe to the MQTT connection event and initialize the AI โ€‹โ€‹Agent after the connection is successful.
TUYA_CALL_ERR_RETURN(tal_event_subscribe(EVENT_MQTT_CONNECTED, "ai_agent_init",
__ai_mqtt_connected_evt, SUBSCRIBE_TYPE_EMERGENCY));

return OPRT_OK;
}

// Usage example: send text
void send_text_to_ai(void)
{
ai_agent_send_text("How is the weather today?");
}

// Usage example: Request tone
void request_alert(void)
{
ai_agent_cloud_alert(AT_WAKEUP);
}

// Usage example: switch roles
void switch_role(void)
{
ai_agent_role_switch("");
}