developer docs
1분 안에 첫 AI API를 연결하세요.
endpoint를 고르고, 동작하는 request를 복사한 뒤 server-side API key로 연결하세요. 첫 예제는 text request로 시작합니다.
endpoint-first integration
Workbench에서 테스트하고 request를 복사하세요.
각 API는 명확한 endpoint와 request shape를 가집니다. 이 문서는 실제 backend에 붙일 수 있는 가장 작은 코드 경로부터 시작합니다.
Create an API key
계정에서 API key를 만들고 server-side environment variable로 보관하세요. client bundle에는 절대 포함하지 않습니다.
Call the selected endpoint
Workbench에서 테스트한 API를 고르고, endpoint reference에 있는 request body로 호출합니다.
Read the result
response는 제품에 붙이기 쉬운 형태로 유지됩니다. cost와 latency는 usage view에서 확인하세요.
cURL quickstart
curl -X POST "http://aiapi.kogrobo.com:11115/llm/v1/chat/completions" \
-H "Authorization: Bearer $AI_OMAKASE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Summarize this incident report and suggest next actions.",
"temperature": 0.1
}'credentials
API key flow
API key는 server에서 읽고, 모든 request에 Bearer token으로 전달하세요. frontend에는 safe proxy route만 노출합니다.
headers
Authorization: Bearer <your_access_token> Content-Type: application/json
브라우저에서 호출할 때는 /api/chat처럼 same-origin relative path를 사용하세요.
production patterns
선택한 API endpoint로 구성하기
각 기능은 고유한 endpoint를 가지며, auth, usage, example은 하나의 platform workflow 안에서 확인할 수 있습니다.
/api/chatSupport triage
티켓 요약, 우선순위 분류, 다음 액션 제안은 text endpoint 하나로 시작할 수 있습니다.
/api/embedding + /api/rerankRAG search
Embedding으로 후보를 만들고 Re-ranking으로 문맥 관련도에 맞게 정렬합니다.
/api/stt + /api/ttsVoice workflow
STT, TTS, Voice Clone을 같은 auth flow와 usage model 안에서 연결합니다.
endpoint reference
request를 복사하고 workflow를 연결하세요.
아래 spec은 현재 앱에서 제공하는 API surface를 기준으로 합니다. Workbench에서 테스트한 endpoint와 같은 request shape를 사용하세요.
Text (LLM)
Generates a text response from a prompt through the text generation endpoint.
/api/chatBody (application/json)
request
{
"input": "Question or instruction",
"temperature": 0.1
}Success (200)
response
{
"text": "Generated response text"
}- `temperature` defaults near 0.1 when omitted.
- Usage limits may return 429 with an `error` payload.
Embedding
Converts input text into an embedding vector.
/api/embeddingBody (application/json)
request
{
"input": "Text to embed",
"input_type": "string"
}Success (200)
response
{
"embeddingVector": [0.012, -0.034, ...]
}- `text` is also accepted and normalized to `input` internally.
- Upstream failures may return a non-200 status with an `error` message.
Re-ranking
Ranks candidate passages by relevance to a query.
/api/rerankBody (application/json)
request
{
"query": "Search query",
"input": ["Candidate passage 1", "Candidate passage 2"]
}Success (200)
response
// The upstream Re-rank API response is passed through.
- `query` must not be empty, and `input` must be an array of strings.
TTS (Text-to-Speech)
Synthesizes speech from text through the upstream TTS service. Use `GET /api/tts` to fetch available speakers.
/api/ttsBody (application/json)
request
{
"text": "Text to read (required)",
"language": "ko | korean | english | japanese (optional)",
"speaker": "ryan (optional)",
"instruct": "Style or tone instruction (optional)",
"style_instruction": "Alias for instruct (optional)"
}Success (200)
response
Content-Type: audio/mpeg or another upstream audio type <binary audio stream>
- `text` is the only required field. `language` accepts short codes such as ko/en or names such as korean/english.
- Errors return JSON with an `error` field and may use 400, 429, 500, or 504.
- Synthesis requests have an approximate 58-second execution limit.
- Speaker list: `GET /api/tts`, proxied from upstream `/tts/speakers`.
STT (Speech-to-Text)
Uploads an audio file and converts it to text with `multipart/form-data`.
/api/sttBody (multipart/form-data)
request
Fields: file — audio file (required) language — optional task — optional beam_size — optional vad_filter — optional
Success (200)
response
The upstream STT JSON response is returned as-is.
- Long-running jobs may hit the execution limit and return 504.
- Missing `file` returns 400 with a validation message.
Voice Clone
Uploads a reference voice and synthesizes text with that speaker profile. Send as `multipart/form-data`.
/api/voice-cloneBody (multipart/form-data)
request
Fields:
ref_audio — reference audio file (required, WAV/MP3, etc.)
text — text to synthesize (required)
language — optional, default Korean
Korean | English | Chinese | Japanese
x_vector_only_mode — true | false (optional, default true)
true → clone timbre only (faster, no ref_text)
false → clone timbre, prosody, and style (ref_text recommended)
ref_text — reference transcript, recommended when x_vector_only_mode=falseSuccess (200)
response
Content-Type: audio/wav or another upstream audio type <binary audio stream>
- `ref_audio` and `text` are required.
- `x_vector_only_mode=true` works without `ref_text` and is faster.
- `x_vector_only_mode=false` can use `ref_text` to preserve prosody and speaking style.
- Requests have a 60-second execution limit and may return 504.
Image2Text (Vision OCR)
Uploads an image for vision analysis and text extraction with `multipart/form-data`.
/api/image2textBody (multipart/form-data)
request
Fields: image — image file (required, JPG/PNG/WEBP/GIF) prompt — optional analysis instruction temperature — optional float from 0 to 1, default 0.1
Success (200)
response
{
"text": "Image analysis and extracted text"
}- `image` is the only required field. Omitting `prompt` uses the default image description and text extraction instruction.
- Images are base64-encoded server-side before being sent to the Vision API.
- Empty or malformed upstream responses may return 502.
- Requests have a 115-second execution limit and may return 504.
- Usage limits may return 429, same as `/api/chat`.
Text-to-Music
Generates music audio from a text prompt. Use this endpoint when the product needs background music, short loops, or prompt-driven audio assets.
/api/t2mBody (application/json)
request
{
"prompt": "Upbeat jazz with piano and saxophone, 120bpm",
"audio_duration": 10
}Success (200)
response
Content-Type: audio/wav or another upstream audio type <binary audio stream>
- `prompt` is required and should describe mood, instruments, tempo, and style.
- `audio_duration` controls the generated clip length when supported by the upstream model.
- Long generation requests may return 504; reduce duration or prompt complexity before retrying.
Image Generation
Generates an image from a text prompt. Use this endpoint for creative assets, product concepts, and visual prototypes.
/api/t2iBody (application/json)
request
{
"prompt": "A serene mountain landscape at sunset, photorealistic",
"negative_prompt": "low quality, blurry",
"width": 1024,
"height": 1024,
"num_inference_steps": 50
}Success (200)
response
Content-Type: image/png or another upstream image type <binary image stream>
- `prompt` is required. `negative_prompt` can be an empty string when not needed.
- `width`, `height`, and `num_inference_steps` affect generation time and output detail.
- Large images or high step counts may take longer; tune settings in Workbench before shipping.
resilient integration
Error handling
status code를 기준으로 retry, input correction, usage 안내를 client에서 분리하세요.
400request body, file, required field가 올바르지 않습니다.
field name과 Content-Type을 먼저 확인하세요.
401 / 403API key 또는 계정 권한이 유효하지 않습니다.
Authorization header와 key 상태를 확인하세요.
429일일 체험 한도 또는 usage limit에 도달했습니다.
usage를 확인하거나 더 높은 plan을 선택하세요.
500 / 502내부 처리 또는 upstream model response가 실패했습니다.
retry handling을 추가하고 같은 입력이 반복 실패하면 log를 확인하세요.
504voice, vision, long-form 작업이 time limit을 초과했습니다.
input size를 줄이거나 async flow로 나눠 처리하세요.
API는 계속 추가되고 확장될 수 있으며 endpoint limit도 변경될 수 있습니다. 최신 동작은 Workbench 와 live response에서 확인하세요.