AI Integration in React & Next.js: Best Practices 2026 Guide
Integrating large language models (LLMs) into frontend apps has transformed how we build user interfaces. In this post we explore the best practices to bring AI into your React and Next.js applications.
Why integrate AI in the frontend?
Moving AI logic to the browser provides:
- Lower latency: Responses arrive faster without server roundtrips
- More privacy: User data does not leave the device
- Better UX: Real-time conversational experiences
Recommended stack for 2026
Framework
- React 19+ or Next.js 16+ for maximum performance
- Server Components for sensitive logic
- Client Components for real-time interactivity
LLMs
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [
{ role: "user", content: "Hi, how are you?" }
],
});
Streaming responses
One of the biggest UX upgrades is enabling streaming:
const response = await client.messages.stream({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }],
});
for await (const event of response) {
if (event.type === "content_block_delta") {
console.log(event.delta.text);
}
}
Latency optimization
1. Token budgets
Do not send the whole context. Calculate how many tokens you really need:
const estimateTokens = (text: string) => {
return Math.ceil(text.length / 4); // Rough estimate
};
2. Context caching
Reuse frequent contexts to reduce cost and latency:
const cachedContext = {
userProfile: `Name: Daily, AI expert`,
systemPrompt: `You are a helpful assistant...`,
};
3. Parallel requests
Process multiple requests simultaneously whenever possible.
Practical use cases
Conversational chat
Replace static forms with natural conversations where the LLM extracts data automatically.
Generative UI
The LLM generates dynamic React components based on user context:
const generatedComponent = await generateUI(userContext);
return <>{generatedComponent}</>;
RAG (Retrieval-Augmented Generation)
RAG connects an LLM to your own knowledge base so it answers from your data, not from its training data. The frontend pattern in React:
// 1. User asks a question
// 2. Embed the question and search your vector DB
// 3. Inject retrieved docs into the prompt
// 4. Stream the answer
async function ragQuery(userQuestion: string) {
// Step 1: embed the question
const embedding = await embedText(userQuestion);
// Step 2: find relevant chunks from your knowledge base
const relevantDocs = await vectorDB.search(embedding, { topK: 5 });
// Step 3: build context-enriched prompt
const context = relevantDocs.map(d => d.content).join('\n\n');
// Step 4: stream answer to the UI
const stream = await anthropic.messages.stream({
model: 'claude-opus-4-5',
max_tokens: 1024,
messages: [{
role: 'user',
content: `Answer based on this context:\n\n${context}\n\nQuestion: ${userQuestion}`
}]
});
return stream; // pipe directly to ReadableStream in your route handler
}
When to use RAG on the frontend:
- Product documentation chatbots (answers from your docs, not hallucinations)
- Support agents with access to your FAQ and ticket history
- Internal tools that query company knowledge bases
Connecting AI to external tools with MCPs
Model Context Protocol (MCP) is the standard for giving LLMs structured access to tools, APIs, and data sources. Instead of hardcoding every integration, MCPs let the model decide which tool to call and when.
// Define tools the LLM can use
const tools = [
{
name: 'get_user_orders',
description: 'Fetch order history for a user ID',
input_schema: {
type: 'object',
properties: {
userId: { type: 'string', description: 'The user ID' },
limit: { type: 'number', description: 'Max orders to return' }
},
required: ['userId']
}
},
{
name: 'update_shipping_address',
description: 'Update the shipping address for an order',
input_schema: {
type: 'object',
properties: {
orderId: { type: 'string' },
address: { type: 'string' }
},
required: ['orderId', 'address']
}
}
];
// The model decides when to call them
const response = await anthropic.messages.create({
model: 'claude-opus-4-5',
max_tokens: 1024,
tools,
messages: [{ role: 'user', content: 'What are my last 3 orders?' }]
});
// Handle tool call from model
if (response.stop_reason === 'tool_use') {
const toolUse = response.content.find(b => b.type === 'tool_use');
const result = await executeToolCall(toolUse.name, toolUse.input);
// Feed result back and continue conversation
}
MCPs are the reason AI chat UIs can now do real actions — place orders, update records, call internal APIs — without you writing a custom parser for every use case.
Error handling patterns designed for AI latency
Standard API error handling doesn't work for LLMs. You need patterns built around three new realities: responses take 2-15 seconds, they can stop mid-generation, and rate limits are per-token not per-request.
// ❌ What breaks with LLMs
try {
const data = await fetch('/api/ai').then(r => r.json());
setContent(data.text);
} catch (e) {
setError('Something went wrong');
}
// ✅ What actually works
async function streamWithRecovery(prompt: string) {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 30_000); // 30s hard limit
try {
const response = await fetch('/api/ai/stream', {
method: 'POST',
body: JSON.stringify({ prompt }),
signal: controller.signal
});
if (!response.ok) {
// Differentiate between rate limit (429) and model error (500)
if (response.status === 429) {
const retryAfter = response.headers.get('retry-after') ?? '5';
await delay(parseInt(retryAfter) * 1000);
return streamWithRecovery(prompt); // retry once
}
throw new Error(`AI service error: ${response.status}`);
}
const reader = response.body!.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
setContent(prev => prev + chunk); // update UI incrementally
}
} catch (e) {
if (e instanceof DOMException && e.name === 'AbortError') {
setError('Response timed out. Try a shorter question.');
} else {
setError('AI service unavailable. Please try again.');
}
} finally {
clearTimeout(timeoutId);
}
}
The 3 loading states every AI feature needs (not the usual 1):
thinking— request sent, waiting for first token (spinner)streaming— tokens arriving, show them progressively (no spinner, show partial text)complete— full response, enable copy/share/follow-up actions
Users tolerate 10-15 seconds of wait time when they can see tokens appearing. The same wait with a spinner causes abandonment.
Conclusions
AI integration on the frontend is the natural evolution of modern web development. The difference between a good implementation and a frustrating one is mostly in the details: streaming from the first token, error states designed for AI latency, and RAG that gives the model your actual data.
Key takeaways:
- ✅ Use streaming — it's the single biggest UX improvement
- ✅ Design 3 loading states, not 1
- ✅ Use RAG to ground answers in your data
- ✅ Use MCPs to give the model real actions, not just text
- ✅ Handle rate limits and timeouts explicitly
Need help integrating AI into your React or Next.js app? Contact me — I specialize in exactly this.