Azure OpenAI Streaming Chat in Flutter: Real-Time AI for Mobile Apps

Why Streaming Chat Matters in Flutter Apps

When you call an LLM endpoint from a Flutter app and wait for the full response before displaying it, users see a blank screen or a loading spinner for several seconds. That delay reduces the conversational feel. Streaming token-by-token responses fix this: text appears character by character, the app feels responsive, and users get immediate feedback that something is happening.

The challenge is managing state correctly. Device rotations, app backgrounding, and network interruptions can break your stream. If you rebuild your widget hierarchy, lose your stream reference, or don’t properly handle cancellation, you end up with orphaned streams or duplicate requests.

This guide walks through a production-ready pattern using Azure OpenAI, the dart_openai package, and Riverpod for state management. We’ll cover the setup, streaming implementation, error handling, and real architecture.

Setting Up Azure OpenAI

First, you need an Azure OpenAI deployment. If you don’t have one already, create a resource in the Azure Portal, deploy a model (gpt-4 or gpt-35-turbo), and note your endpoint URL and API key.

In your Flutter project, add the dart_openai package to your pubspec.yaml:

dependencies:
  flutter:
    sdk: flutter
  dart_openai: ^3.0.0
  riverpod: ^2.4.0
  riverpod_generator: ^2.3.0
  flutter_riverpod: ^2.4.0

Create a configuration file to store your credentials safely. In production, use environment variables or a secure storage solution, not hardcoded strings:

class AzureOpenAIConfig {
  static const String apiKey = String.fromEnvironment('AZURE_OPENAI_KEY');
  static const String endpoint = String.fromEnvironment('AZURE_OPENAI_ENDPOINT');
  static const String deploymentId = String.fromEnvironment('AZURE_DEPLOYMENT_ID');
  static const String apiVersion = '2024-02-15-preview';
}

Initializing the OpenAI Client

Set up the OpenAI client once at app startup. Create a provider that initializes and exposes the client:

import 'package:dart_openai/dart_openai.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';

final openAIClientProvider = Provider<OpenAI>((ref) {
  OpenAI.apiKey = AzureOpenAIConfig.apiKey;
  OpenAI.baseUrl = AzureOpenAIConfig.endpoint;
  OpenAI.requestsTimeOut = const Duration(seconds: 120);
  return OpenAI.instance;
});

Building the Chat State Management with Riverpod

Create a state class to hold chat messages and loading state:

class ChatMessage {
  final String id;
  final String content;
  final bool isUser;
  final DateTime timestamp;

  ChatMessage({
    required this.id,
    required this.content,
    required this.isUser,
    required this.timestamp,
  });
}

class ChatState {
  final List<ChatMessage> messages;
  final bool isLoading;
  final String? error;
  final String currentStreamingContent;

  ChatState({
    this.messages = const [],
    this.isLoading = false,
    this.error,
    this.currentStreamingContent = '',
  });

  ChatState copyWith({
    List<ChatMessage>? messages,
    bool? isLoading,
    String? error,
    String? currentStreamingContent,
  }) {
    return ChatState(
      messages: messages ?? this.messages,
      isLoading: isLoading ?? this.isLoading,
      error: error,
      currentStreamingContent: currentStreamingContent ?? this.currentStreamingContent,
    );
  }
}

Implementing the Streaming Chat Notifier

Create a StateNotifier that handles sending messages and streaming responses:

class ChatNotifier extends StateNotifier<ChatState> {
  ChatNotifier(this._openAIClient) : super(ChatState());

  final OpenAI _openAIClient;
  StreamSubscription? _currentStream;

  Future<void> sendMessage(String userMessage) async {
    // Cancel any ongoing stream
    await _currentStream?.cancel();

    // Add user message to history
    final userMsg = ChatMessage(
      id: DateTime.now().millisecondsSinceEpoch.toString(),
      content: userMessage,
      isUser: true,
      timestamp: DateTime.now(),
    );

    state = state.copyWith(
      messages: [...state.messages, userMsg],
      isLoading: true,
      error: null,
      currentStreamingContent: '',
    );

    try {
      // Build conversation history for context
      final messages = state.messages.map((msg) {
        return OpenAIChatCompletionChoiceMessageModel(
          content: msg.content,
          role: msg.isUser ? OpenAIChatMessageRole.user : OpenAIChatMessageRole.assistant,
        );
      }).toList();

      // Create the streaming request
      final stream = _openAIClient.chat.createStream(
        model: AzureOpenAIConfig.deploymentId,
        messages: messages,
        temperature: 0.7,
        maxTokens: 500,
      );

      String fullResponse = '';

      _currentStream = stream.listen(
        (event) {
          final delta = event.choices.first.delta.content ?? '';
          fullResponse += delta;
          state = state.copyWith(
            currentStreamingContent: fullResponse,
          );
        },
        onError: (error) {
          String errorMessage = 'An error occurred';
          if (error.toString().contains('429')) {
            errorMessage = 'Rate limit reached. Please wait a moment before sending another message.';
          } else if (error.toString().contains('401')) {
            errorMessage = 'Authentication failed. Check your API credentials.';
          } else if (error.toString().contains('content_filter')) {
            errorMessage = 'Content filtering applied. Please rephrase your message.';
          }
          state = state.copyWith(
            isLoading: false,
            error: errorMessage,
            currentStreamingContent: '',
          );
        },
        onDone: () {
          // Add assistant response to message history
          final assistantMsg = ChatMessage(
            id: DateTime.now().millisecondsSinceEpoch.toString(),
            content: fullResponse,
            isUser: false,
            timestamp: DateTime.now(),
          );
          state = state.copyWith(
            messages: [...state.messages, assistantMsg],
            isLoading: false,
            currentStreamingContent: '',
          );
        },
      );
    } catch (e) {
      state = state.copyWith(
        isLoading: false,
        error: e.toString(),
        currentStreamingContent: '',
      );
    }
  }

  void clearChat() {
    _currentStream?.cancel();
    state = ChatState();
  }

  @override
  void dispose() {
    _currentStream?.cancel();
    super.dispose();
  }
}

Creating the Riverpod Provider

Expose the chat notifier as a StateNotifierProvider:

final chatProvider = StateNotifierProvider<ChatNotifier, ChatState>((ref) {
  final openAIClient = ref.watch(openAIClientProvider);
  return ChatNotifier(openAIClient);
});

Building the UI with StreamBuilder Pattern

Create a simple chat screen that listens to the chat state and displays messages as they stream in:

class ChatScreen extends ConsumerWidget {
  @override
  Widget build(BuildContext context, WidgetRef ref) {
    final chatState = ref.watch(chatProvider);

    return Scaffold(
      appBar: AppBar(
        title: const Text('AI Support Assistant'),
      ),
      body: Column(
        children: [
          Expanded(
            child: ListView.builder(
              itemCount: chatState.messages.length + (chatState.currentStreamingContent.isNotEmpty ? 1 : 0),
              itemBuilder: (context, index) {
                if (index < chatState.messages.length) {
                  final message = chatState.messages[index];
                  return MessageBubble(
                    content: message.content,
                    isUser: message.isUser,
                  );
                } else {
                  // Show the streaming response
                  return MessageBubble(
                    content: chatState.currentStreamingContent,
                    isUser: false,
                  );
                }
              },
            ),
          ),
          if (chatState.error != null)
            Container(
              color: Colors.red.shade100,
              padding: const EdgeInsets.all(16),
              child: Text(
                chatState.error!,
                style: TextStyle(color: Colors.red.shade900),
              ),
            ),
          ChatInputField(
            onSend: (message) {
              ref.read(chatProvider.notifier).sendMessage(message);
            },
            isLoading: chatState.isLoading,
          ),
        ],
      ),
    );
  }
}

class MessageBubble extends StatelessWidget {
  final String content;
  final bool isUser;

  const MessageBubble({
    required this.content,
    required this.isUser,
  });

  @override
  Widget build(BuildContext context) {
    return Align(
      alignment: isUser ? Alignment.centerRight : Alignment.centerLeft,
      child: Container(
        margin: const EdgeInsets.symmetric(vertical: 8, horizontal: 16),
        padding: const EdgeInsets.all(12),
        decoration: BoxDecoration(
          color: isUser ? Colors.blue : Colors.grey.shade200,
          borderRadius: BorderRadius.circular(12),
        ),
        child: Text(
          content,
          style: TextStyle(
            color: isUser ? Colors.white : Colors.black,
          ),
        ),
      ),
    );
  }
}

class ChatInputField extends ConsumerWidget {
  final Function(String) onSend;
  final bool isLoading;

  const ChatInputField({
    required this.onSend,
    required this.isLoading,
  });

  @override
  Widget build(BuildContext context, WidgetRef ref) {
    final controller = TextEditingController();

    return Container(
      padding: const EdgeInsets.all(16),
      child: Row(
        children: [
          Expanded(
            child: TextField(
              controller: controller,
              enabled: !isLoading,
              decoration: InputDecoration(
                hintText: 'Type your question...',
                border: OutlineInputBorder(
                  borderRadius: BorderRadius.circular(8),
                ),
              ),
            ),
          ),
          const SizedBox(width: 8),
          IconButton(
            onPressed: isLoading
                ? null
                : () {
                    if (controller.text.isNotEmpty) {
                      onSend(controller.text);
                      controller.clear();
                    }
                  },
            icon: const Icon(Icons.send),
          ),
        ],
      ),
    );
  }
}

Handling Network Interruptions and State Recovery

In a real app, you need to handle backgrounding and reconnection. Add a connectivity listener to your app:

import 'package:connectivity_plus/connectivity_plus.dart';

final connectivityProvider = StreamProvider<ConnectivityResult>((ref) {
  return Connectivity().onConnectivityChanged;
});

// In your chat screen, watch connectivity and show offline state
final connectivity = ref.watch(connectivityProvider);

connectivity.whenData((result) {
  if (result == ConnectivityResult.none) {
    ScaffoldMessenger.of(context).showSnackBar(
      const SnackBar(content: Text('You are offline. Messages will be queued.')),
    );
  }
});

Practical Example: A UAE Retail Support Chat

Imagine a Dubai-based retail app where customers ask about product availability, returns, and shipping. Your system prompt can be customized for this domain:

final systemPrompt = '''You are a helpful customer support assistant for a Dubai retail store. 
You help customers with:
- Product availability and specifications
- Return and exchange policies
- Shipping information within the UAE
- Payment methods accepted

Always be professional and friendly. If you don't know something, direct the customer to contact support at support@retailapp.ae.
''';

Pass this to your chat request by adding it to the messages list as a system role message before sending user messages. This keeps responses focused on your domain and reduces off-topic outputs.

Error Handling Deep Dive

Azure OpenAI can return several types of errors. Handle them gracefully:

429 (Rate Limit): Implement exponential backoff or queue messages for retry. Azure enforces per-minute and per-day quotas depending on your deployment tier.
401 (Authentication): Check API key and endpoint validity. This typically means your credentials have expired or are misconfigured.
Content Filter: Azure applies content filtering by default. As noted in Microsoft’s documentation, content is buffered and vetted before returning to the user. Catch these errors and inform the user without breaking the UI.
Network Timeout: If the stream doesn’t complete within your timeout window (set to 120 seconds in the example), cancel and retry with user notification.
Malformed Response: Always validate the stream event structure before accessing delta content. Check that choices and delta exist before reading content.

Performance Considerations

Keep these points in mind as your app scales:

Store conversation history in a local database (Hive or Drift) so chats survive app restarts. This is essential for UAE users on spotty connections.
Limit message history sent to the API to the last 10-15 messages to reduce token usage and latency. Older context can be summarized or discarded.
Use a message queue to prevent duplicate requests if the user taps send multiple times rapidly. Riverpod’s state model handles this naturally if you check isLoading before allowing new sends.
Test on lower-end devices. Riverpod rebuilds can be expensive if your state is large. Profile with DevTools to spot bottlenecks.
Monitor token usage in Azure to stay within budget and quota limits. Set up alerts in the Azure Portal so you’re not surprised by overages.

Wrapping Up

Streaming LLM responses in Flutter isn’t just about showing text as it arrives. It’s about managing state correctly so your UI stays responsive across device rotations, app backgrounding, and network interruptions. Using Riverpod as your state container, properly cancelling streams, and handling errors gracefully makes the difference between a prototype and a production app.

The pattern shown here scales from a simple support chatbot to more complex conversational features. Start with this foundation, test thoroughly on real devices in your target market, and iterate based on user feedback. Your UAE mobile users will appreciate the snappy, responsive feel of real-time AI conversations.

Do I need to use Riverpod for state management?

No, you can use any state management solution you prefer. BLoC, Provider, or even StatefulWidget work fine. Riverpod is shown here because it handles stream lifecycle well, but the streaming logic itself is framework-agnostic.

Can I use the http package instead of dart_openai?

Yes, absolutely. The dart_openai package is a convenience wrapper. If you prefer direct HTTP calls, use http or dio to POST to your Azure endpoint and parse the streaming response yourself. You’ll have more control but more boilerplate.

How do I store chat history locally?

Use Hive for simple key-value storage or Drift for a more structured database. Store messages locally, then load them when the user reopens the app. Sync with a backend if needed for multi-device support.

What’s the token cost for streaming vs non-streaming?

Azure charges per token, regardless of whether you stream or not. Streaming doesn’t reduce cost, but it improves user experience by showing responses incrementally. Use streaming for interactive chat, batch requests for background processing.

How do I handle user authentication with Azure OpenAI?

Never expose your Azure API key in client code. Use a backend service that validates user identity, then calls Azure OpenAI on the user’s behalf. Return the stream or response to the client. This keeps your credentials secure and lets you audit usage per user.