AI Integration with Laravel: Complete Developer Guide 2026

Laravel is the most widely used PHP framework for custom SaaS and web applications — and it integrates cleanly with AI APIs. This guide covers the complete implementation: from installing the SDK to handling streaming responses, background job queuing, caching, and avoiding common production pitfalls.

Setting Up the OpenAI / Anthropic SDK in Laravel

Install the official PHP packages:

composer require openai-php/client
# or for Anthropic Claude:
composer require anthropic-ai/sdk

Add your API keys to .env:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

// App\Providers\AppServiceProvider.php
use OpenAI;

public function register(): void
{
    $this->app->singleton(OpenAI\Client::class, function () {
        return OpenAI::client(config('services.openai.key'));
    });
}

Always inject the client via the service container — never instantiate it with new in controllers. This makes testing and swapping providers trivial.

Basic Chat Completion

A simple controller action calling GPT-4o:

use OpenAI\Client as OpenAIClient;

class AiController extends Controller
{
    public function __construct(private OpenAIClient $ai) {}

    public function chat(Request $request): JsonResponse
    {
        $request->validate(['message' => 'required|string|max:2000']);

        $response = $this->ai->chat()->create([
            'model'    => 'gpt-4o',
            'messages' => [
                ['role' => 'system', 'content' => 'You are a helpful assistant for [Product].'],
                ['role' => 'user',   'content' => $request->message],
            ],
            'max_tokens'  => 800,
            'temperature' => 0.3,
        ]);

        return response()->json([
            'reply' => $response->choices[0]->message->content,
        ]);
    }
}

Adding RAG: Retrieval-Augmented Generation

Without RAG, the LLM has no knowledge of your product. Here's the pattern using pgvector (PostgreSQL extension):

1. Store document embeddings in the database

// Migration
Schema::create('document_chunks', function (Blueprint $table) {
    $table->id();
    $table->string('source');
    $table->text('content');
    $table->vector('embedding', 1536); // pgvector column
    $table->timestamps();
});

// Indexing a document
public function indexDocument(string $source, string $content): void
{
    $chunks = $this->splitIntoChunks($content, maxTokens: 500);

    foreach ($chunks as $chunk) {
        $embedding = $this->ai->embeddings()->create([
            'model' => 'text-embedding-3-small',
            'input' => $chunk,
        ])->embeddings[0]->embedding;

        DocumentChunk::create([
            'source'    => $source,
            'content'   => $chunk,
            'embedding' => json_encode($embedding),
        ]);
    }
}

2. Retrieve relevant chunks at query time

public function retrieve(string $query, int $topK = 5): array
{
    $queryEmbedding = $this->ai->embeddings()->create([
        'model' => 'text-embedding-3-small',
        'input' => $query,
    ])->embeddings[0]->embedding;

    // pgvector cosine distance query
    return DB::select("
        SELECT content, 1 - (embedding <=> ?) AS similarity
        FROM document_chunks
        ORDER BY embedding <=> ?
        LIMIT ?
    ", [json_encode($queryEmbedding), json_encode($queryEmbedding), $topK]);
}

3. Inject retrieved context into the system prompt

$chunks   = $this->retrieve($request->message);
$context  = collect($chunks)->pluck('content')->join("\n\n---\n\n");

$systemPrompt = "You are a helpful assistant for [Product].
Answer ONLY using the context provided below.
If the answer is not in the context, say you're not sure and suggest contacting support.

CONTEXT:
{$context}";

$response = $this->ai->chat()->create([
    'model'    => 'gpt-4o',
    'messages' => [
        ['role' => 'system', 'content' => $systemPrompt],
        ...$conversationHistory,
        ['role' => 'user', 'content' => $request->message],
    ],
]);

Streaming Responses with Server-Sent Events

Users expect to see responses appear word-by-word. Use SSE:

public function stream(Request $request): StreamedResponse
{
    return response()->stream(function () use ($request) {
        $stream = $this->ai->chat()->createStreamed([
            'model'    => 'gpt-4o',
            'messages' => $this->buildMessages($request),
        ]);

        foreach ($stream as $response) {
            $delta = $response->choices[0]->delta->content ?? '';
            if ($delta) {
                echo "data: " . json_encode(['token' => $delta]) . "\n\n";
                ob_flush();
                flush();
            }
        }

        echo "data: [DONE]\n\n";
    }, 200, [
        'Content-Type'  => 'text/event-stream',
        'Cache-Control' => 'no-cache',
        'X-Accel-Buffering' => 'no', // disable Nginx buffering
    ]);
}

Queuing Long AI Jobs

For document processing, report generation, or other slow AI tasks, use Laravel Queues:

class ProcessDocumentWithAI implements ShouldQueue
{
    public int $timeout = 120;
    public int $tries   = 3;

    public function handle(OpenAIClient $ai): void
    {
        $result = $ai->chat()->create([/* ... */]);

        $this->document->update([
            'ai_summary' => $result->choices[0]->message->content,
            'processed_at' => now(),
        ]);

        // Broadcast to frontend via Laravel Echo / Pusher
        event(new DocumentProcessed($this->document));
    }

    public function failed(\Throwable $e): void
    {
        $this->document->update(['ai_status' => 'failed']);
    }
}

Caching AI Responses

LLM API calls cost money. Cache responses for identical or near-identical queries:

public function cachedCompletion(string $prompt): string
{
    $cacheKey = 'ai:' . hash('sha256', $prompt);

    return Cache::remember($cacheKey, now()->addHours(24), function () use ($prompt) {
        return $this->ai->chat()->create([
            'model'    => 'gpt-4o',
            'messages' => [['role' => 'user', 'content' => $prompt]],
        ])->choices[0]->message->content;
    });
}

Don't cache user-specific or session-specific responses — only cache responses to deterministic, context-free queries like content generation, summarisation, or classification tasks.

Rate Limiting and Cost Control

Protect your OpenAI bill from runaway usage:

// routes/api.php
Route::middleware(['auth:sanctum', 'throttle:ai'])->group(function () {
    Route::post('/chat', [AiController::class, 'chat']);
});

// App\Providers\RouteServiceProvider.php
RateLimiter::for('ai', function (Request $request) {
    return [
        Limit::perMinute(10)->by($request->user()?->id),
        Limit::perDay(100)->by($request->user()?->id),
    ];
});

Error Handling and Fallbacks

AI APIs can fail — timeouts, rate limits, model overload. Always wrap calls:

try {
    $response = $this->ai->chat()->create([/* ... */]);
    return $response->choices[0]->message->content;

} catch (\OpenAI\Exceptions\TransporterException $e) {
    // Network error — retry via queue
    Log::error('OpenAI network error', ['error' => $e->getMessage()]);
    throw new AiTemporarilyUnavailableException();

} catch (\OpenAI\Exceptions\ErrorException $e) {
    // API error (rate limit, invalid request, etc.)
    if ($e->getErrorCode() === 'rate_limit_exceeded') {
        return $this->fallbackResponse('I\'m temporarily busy. Please try again in a moment.');
    }
    throw $e;
}

Need Laravel AI Integration Built for Your Product?

CSNexa's Laravel developers integrate AI APIs into existing SaaS products. Fixed price, 3–6 week delivery, senior engineers from day one.

View Laravel Development Services

Production Checklist

API keys stored in environment variables, never in code or git
All AI calls wrapped in try/catch with appropriate fallbacks
Rate limiting per user to prevent cost spikes
Response caching for deterministic prompts
Long-running AI tasks dispatched as background jobs
Token usage logged per user for cost attribution
System prompt reviewed and tested for prompt injection resistance
PII scrubbed before sending to external API
Model version pinned (don't use gpt-4 alias — pin to gpt-4o-2024-11-20)

Building an AI feature in your Laravel app? Get a free estimate or email hello@csnexa.com — our team has delivered AI integrations for SaaS products across Australia, the US, and the UK.

Written by Rohitash Kumar

Founder & CEO, CSNexa — 17+ Years of software engineering experience.

View full profile →

Need expert Laravel developers?

17+ years of experience. Fixed-price delivery. Free quote in 4 hours.

Hire our Laravel team →