LLM API Request
Updated: 2025-11-25
Now that you have made an LLM interaction, let us learn how to structure requests, configure parameters, and handle responses.
Messages
Single Interaction
Most LLM APIs follow a similar pattern using a chat completion interface. Here is a request example:
import os
from dotenv import load_dotenv
from openai import OpenAI
from openai.types.chat import ChatCompletion
load_dotenv()
def single_interaction(client: OpenAI) -> ChatCompletion:
return client.chat.completions.create(
model="gpt-5-nano",
messages=[
{"role": "user", "content": "Who are you?"}
]
)
if __name__ == "__main__":
c = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
r = single_interaction(c)
print(r.choices[0].message.content)L8: Use typing to indicate the parameter type (
OpenAI)
I’m ChatGPT, an AI assistant created by OpenAI. I’m built on the GPT-4 architecture and I’m here to help with a wide range of tasks—answering questions, explaining ideas, drafting or editing text, writing code, brainstorming, translating, planning, and more.
A few notes:
I don’t access personal data about you unless you share it in the chat.
I don’t browse the web unless a browsing tool is enabled.
I try to be accurate, but I can make mistakes; feel free to double-check important details.
I can remember context within this conversation, but I don’t retain memory between chats.
What would you like to do or know today?
The messages array represents the conversation history and follows a structured format with different roles:
user: Represents messages from the user
system: Sets the behavior and context for the model
assistant: Represents previous responses from the model
If you are making an independent API call (as above), it contains a dictionary where role is specified to user and content contains the input prompt.
Multi-Turn Interactions
To facilitate multi-turn interactions with an LLM, you first specify its system role and then provide the user prompt. In this case, the messages array contains an additional dictionary where role is set to system, and content describes its persistent role within this client.
To maintain context across multiple interactions, you should include the entire conversation history. You can add the LLM’s response to messages as an additional dictionary, where role is assistant and content is the LLM output (e.g., “20” in the above example).
Q1: How does an LLM handle the following three messages differently?
Parameters
Let's explore the important parameters you can configure when making API requests:
model
Specifies which LLM to use. Different models have different capabilities, costs, and context windows.
max_tokens
Controls the maximum number of tokens (roughly words) the model can generate in its response.
{% hint style="warning" %} Setting max_tokens too low may cause responses to be cut off mid-sentence. Setting it too high increases costs and latency. {% endhint %}
temperature
Controls the randomness of the model's output. Range: 0.0 to 2.0
Low values (0.0 - 0.3): More deterministic and focused responses
Medium values (0.5 - 0.7): Balanced creativity and coherence
High values (0.8 - 2.0): More creative and diverse responses
Q5: When would you use a temperature of 0.0 versus 0.9?
top_p (nucleus sampling)
An alternative to temperature that controls diversity by considering only the top tokens whose cumulative probability adds up to top_p. Range: 0.0 to 1.0
{% hint style="info" %} It's generally recommended to alter either temperature OR top_p, but not both simultaneously. {% endhint %}
frequency_penalty
Reduces repetition by penalizing tokens based on how frequently they've appeared. Range: -2.0 to 2.0
presence_penalty
Encourages the model to talk about new topics by penalizing tokens that have appeared at all. Range: -2.0 to 2.0
stop
Specifies sequences where the API will stop generating further tokens.
Complete Example with Multiple Parameters
Here's a comprehensive example showing how these parameters work together:
Handling API Responses
Understanding the response structure is crucial for extracting the information you need:
Response Object Structure
Provider-Specific Differences
While the basic structure is similar across providers, there are some differences:
Anthropic (Claude)
Key differences:
max_tokensis required (not optional)No system role in messages array; use a separate
systemparameterResponse structure:
response.content[0].textinstead ofresponse.choices[0].message.content
Google (Gemini)
Error Handling
Always implement proper error handling when working with APIs:
Best Practices
Start with lower temperature values (0.0-0.3) for factual tasks, increase for creative tasks
Set appropriate max_tokens to balance cost and completeness
Include system messages to set consistent behavior
Monitor token usage to manage costs effectively
Implement retry logic for production applications
Store API keys securely using environment variables
Cache responses when appropriate to reduce API calls
Cost Considerations
API usage is typically priced per token:
Input tokens (your prompts) are usually cheaper
Output tokens (model responses) are usually more expensive
Different models have different pricing tiers
Practical Exercise
Try experimenting with different parameter values to see their effects:
Q6: Run the above experiment and describe how the outputs differ across temperature settings.
Summary
In this section, you learned:
How to structure API requests with the messages array
Key parameters: model, max_tokens, temperature, top_p, penalties, and stop sequences
How to handle multi-turn conversations
Response structure and accessing generated content
Provider-specific differences (OpenAI, Anthropic, Google)
Error handling and retry logic
Best practices for cost management
These skills form the foundation for all programmatic interactions with LLMs and will be essential for your course projects.
{% hint style="success" %} Practice making API calls with different parameters to develop an intuition for how they affect model behavior. This experimentation is key to becoming proficient in working with LLMs. {% endhint %}
Last updated
Was this helpful?