Laravel is the most widely used PHP framework for custom SaaS and web applications — and it integrates cleanly with AI APIs. This guide covers the complete implementation: from installing the SDK to handling streaming responses, background job queuing, caching, and avoiding common production pitfalls.
Setting Up the OpenAI / Anthropic SDK in Laravel
Install the official PHP packages:
composer require openai-php/client # or for Anthropic Claude: composer require anthropic-ai/sdk
Add your API keys to .env:
OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-...
Register a singleton in AppServiceProvider:
// App\Providers\AppServiceProvider.php
use OpenAI;
public function register(): void
{
$this->app->singleton(OpenAI\Client::class, function () {
return OpenAI::client(config('services.openai.key'));
});
}new in controllers. This makes testing and swapping providers trivial.Basic Chat Completion
A simple controller action calling GPT-4o:
use OpenAI\Client as OpenAIClient;
class AiController extends Controller
{
public function __construct(private OpenAIClient $ai) {}
public function chat(Request $request): JsonResponse
{
$request->validate(['message' => 'required|string|max:2000']);
$response = $this->ai->chat()->create([
'model' => 'gpt-4o',
'messages' => [
['role' => 'system', 'content' => 'You are a helpful assistant for [Product].'],
['role' => 'user', 'content' => $request->message],
],
'max_tokens' => 800,
'temperature' => 0.3,
]);
return response()->json([
'reply' => $response->choices[0]->message->content,
]);
}
}Adding RAG: Retrieval-Augmented Generation
Without RAG, the LLM has no knowledge of your product. Here's the pattern using pgvector (PostgreSQL extension):
1. Store document embeddings in the database
// Migration
Schema::create('document_chunks', function (Blueprint $table) {
$table->id();
$table->string('source');
$table->text('content');
$table->vector('embedding', 1536); // pgvector column
$table->timestamps();
});
// Indexing a document
public function indexDocument(string $source, string $content): void
{
$chunks = $this->splitIntoChunks($content, maxTokens: 500);
foreach ($chunks as $chunk) {
$embedding = $this->ai->embeddings()->create([
'model' => 'text-embedding-3-small',
'input' => $chunk,
])->embeddings[0]->embedding;
DocumentChunk::create([
'source' => $source,
'content' => $chunk,
'embedding' => json_encode($embedding),
]);
}
}2. Retrieve relevant chunks at query time
public function retrieve(string $query, int $topK = 5): array
{
$queryEmbedding = $this->ai->embeddings()->create([
'model' => 'text-embedding-3-small',
'input' => $query,
])->embeddings[0]->embedding;
// pgvector cosine distance query
return DB::select("
SELECT content, 1 - (embedding <=> ?) AS similarity
FROM document_chunks
ORDER BY embedding <=> ?
LIMIT ?
", [json_encode($queryEmbedding), json_encode($queryEmbedding), $topK]);
}3. Inject retrieved context into the system prompt
$chunks = $this->retrieve($request->message);
$context = collect($chunks)->pluck('content')->join("\n\n---\n\n");
$systemPrompt = "You are a helpful assistant for [Product].
Answer ONLY using the context provided below.
If the answer is not in the context, say you're not sure and suggest contacting support.
CONTEXT:
{$context}";
$response = $this->ai->chat()->create([
'model' => 'gpt-4o',
'messages' => [
['role' => 'system', 'content' => $systemPrompt],
...$conversationHistory,
['role' => 'user', 'content' => $request->message],
],
]);Streaming Responses with Server-Sent Events
Users expect to see responses appear word-by-word. Use SSE:
public function stream(Request $request): StreamedResponse
{
return response()->stream(function () use ($request) {
$stream = $this->ai->chat()->createStreamed([
'model' => 'gpt-4o',
'messages' => $this->buildMessages($request),
]);
foreach ($stream as $response) {
$delta = $response->choices[0]->delta->content ?? '';
if ($delta) {
echo "data: " . json_encode(['token' => $delta]) . "\n\n";
ob_flush();
flush();
}
}
echo "data: [DONE]\n\n";
}, 200, [
'Content-Type' => 'text/event-stream',
'Cache-Control' => 'no-cache',
'X-Accel-Buffering' => 'no', // disable Nginx buffering
]);
}Queuing Long AI Jobs
For document processing, report generation, or other slow AI tasks, use Laravel Queues:
class ProcessDocumentWithAI implements ShouldQueue
{
public int $timeout = 120;
public int $tries = 3;
public function handle(OpenAIClient $ai): void
{
$result = $ai->chat()->create([/* ... */]);
$this->document->update([
'ai_summary' => $result->choices[0]->message->content,
'processed_at' => now(),
]);
// Broadcast to frontend via Laravel Echo / Pusher
event(new DocumentProcessed($this->document));
}
public function failed(\Throwable $e): void
{
$this->document->update(['ai_status' => 'failed']);
}
}Caching AI Responses
LLM API calls cost money. Cache responses for identical or near-identical queries:
public function cachedCompletion(string $prompt): string
{
$cacheKey = 'ai:' . hash('sha256', $prompt);
return Cache::remember($cacheKey, now()->addHours(24), function () use ($prompt) {
return $this->ai->chat()->create([
'model' => 'gpt-4o',
'messages' => [['role' => 'user', 'content' => $prompt]],
])->choices[0]->message->content;
});
}Rate Limiting and Cost Control
Protect your OpenAI bill from runaway usage:
// routes/api.php
Route::middleware(['auth:sanctum', 'throttle:ai'])->group(function () {
Route::post('/chat', [AiController::class, 'chat']);
});
// App\Providers\RouteServiceProvider.php
RateLimiter::for('ai', function (Request $request) {
return [
Limit::perMinute(10)->by($request->user()?->id),
Limit::perDay(100)->by($request->user()?->id),
];
});Error Handling and Fallbacks
AI APIs can fail — timeouts, rate limits, model overload. Always wrap calls:
try {
$response = $this->ai->chat()->create([/* ... */]);
return $response->choices[0]->message->content;
} catch (\OpenAI\Exceptions\TransporterException $e) {
// Network error — retry via queue
Log::error('OpenAI network error', ['error' => $e->getMessage()]);
throw new AiTemporarilyUnavailableException();
} catch (\OpenAI\Exceptions\ErrorException $e) {
// API error (rate limit, invalid request, etc.)
if ($e->getErrorCode() === 'rate_limit_exceeded') {
return $this->fallbackResponse('I\'m temporarily busy. Please try again in a moment.');
}
throw $e;
}Need Laravel AI Integration Built for Your Product?
CSNexa's Laravel developers integrate AI APIs into existing SaaS products. Fixed price, 3–6 week delivery, senior engineers from day one.
View Laravel Development ServicesProduction Checklist
- API keys stored in environment variables, never in code or git
- All AI calls wrapped in try/catch with appropriate fallbacks
- Rate limiting per user to prevent cost spikes
- Response caching for deterministic prompts
- Long-running AI tasks dispatched as background jobs
- Token usage logged per user for cost attribution
- System prompt reviewed and tested for prompt injection resistance
- PII scrubbed before sending to external API
- Model version pinned (don't use
gpt-4alias — pin togpt-4o-2024-11-20)
Building an AI feature in your Laravel app? Get a free estimate or email hello@csnexa.com — our team has delivered AI integrations for SaaS products across Australia, the US, and the UK.
Related: Build an AI Chatbot for SaaS | Laravel Development Services | AI Integration for Business Applications
Need expert Laravel developers?
17+ years of experience. Fixed-price delivery. Free quote in 4 hours.
Hire our Laravel team →