Skip to content

Chat Completions

Create conversational responses using a wide range of AI models through a single unified API.

Create a chat completion

POST https://ciyuanx.io/v1/chat/completions

bash
curl https://ciyuanx.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-4.1",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "temperature": 0.7
  }'
python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://ciyuanx.io/v1",
)

response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)
typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://ciyuanx.io/v1",
});

const response = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is the capital of France?" },
  ],
  temperature: 0.7,
});

console.log(response.choices[0].message.content);

Request parameters

ParameterTypeRequiredDescription
modelstringYesThe model ID to use
messagesarrayYesAn array of message objects
temperaturenumberNoSampling temperature (0–2). Default: 1
max_tokensintegerNoMaximum number of tokens to generate
top_pnumberNoNucleus sampling parameter
streambooleanNoWhether to stream the response

Message roles

Each message must include a role and content:

  • system — sets the assistant's behavior or persona
  • user — a message from the end user
  • assistant — a previous response from the AI

Streaming responses

Enable streaming to receive responses incrementally:

python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://ciyuanx.io/v1",
)

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://ciyuanx.io/v1",
});

const stream = await client.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "Tell me a story" }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Best practices

Optimize your requests

  • Set max_tokens to cap costs
  • Use an appropriate temperature (lower for factual, higher for creative)
  • Include a system message to guide behavior
  • Stream responses for a better user experience

Note

Mind each model's token limit. Long conversations may need conversation-history management.