Chat Completions
Create conversational responses using a wide range of AI models through a single unified API.
Create a chat completion
POST https://ciyuanx.io/v1/chat/completions
bash
curl https://ciyuanx.io/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4.1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"temperature": 0.7
}'python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://ciyuanx.io/v1",
)
response = client.chat.completions.create(
model="gpt-4.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
],
temperature=0.7,
)
print(response.choices[0].message.content)typescript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://ciyuanx.io/v1",
});
const response = await client.chat.completions.create({
model: "gpt-4.1",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
],
temperature: 0.7,
});
console.log(response.choices[0].message.content);Request parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The model ID to use |
messages | array | Yes | An array of message objects |
temperature | number | No | Sampling temperature (0–2). Default: 1 |
max_tokens | integer | No | Maximum number of tokens to generate |
top_p | number | No | Nucleus sampling parameter |
stream | boolean | No | Whether to stream the response |
Message roles
Each message must include a role and content:
system— sets the assistant's behavior or personauser— a message from the end userassistant— a previous response from the AI
Streaming responses
Enable streaming to receive responses incrementally:
python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://ciyuanx.io/v1",
)
stream = client.chat.completions.create(
model="gpt-4.1",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")typescript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://ciyuanx.io/v1",
});
const stream = await client.chat.completions.create({
model: "gpt-4.1",
messages: [{ role: "user", content: "Tell me a story" }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}Best practices
Optimize your requests
- Set
max_tokensto cap costs - Use an appropriate
temperature(lower for factual, higher for creative) - Include a system message to guide behavior
- Stream responses for a better user experience
Note
Mind each model's token limit. Long conversations may need conversation-history management.