Recently I built a small experimental AI search application to better understand how modern AI assistants work internally.

Instead of relying on abstractions or frameworks alone, I wanted to implement the entire flow myself:

user query
tool usage
reasoning
token streaming
frontend updates in real time

The project uses FastAPI, LangChain, and Server-Sent Events (SSE) to stream responses directly to the browser.

The repository is public, and sensitive credentials are stored in .env, which is excluded from version control via .gitignore.

The Initial Version

The first working version was extremely simple.

Everything lived in a single Python file. Even the HTML frontend was embedded inside Python strings.

Example:

return f"""
<html>
  ...
</html>
"""

It worked, but it quickly became messy.

Frontend structure, styling, scripts, and backend logic were all mixed together.

At that point the code was functioning but not maintainable.

Core Architecture

The system pipeline looks like this:

User query
   ↓
FastAPI endpoint
   ↓
LangChain agent
   ↓
DuckDuckGo search tool
   ↓
LLM reasoning
   ↓
Streaming answer
   ↓
Browser UI

The agent follows a ReAct reasoning pattern.

Typical execution looks like this in the logs:

Thought
Action: DuckDuckGoSearch
Observation
Thought
Action: DuckDuckGoSearch
Observation
Final Answer

This allows the LLM to decide when it needs external information.

Streaming Responses

Instead of waiting for the full response, the server streams tokens as they are generated.

The backend emits SSE events such as:

event: status
data: thinking...

event: status
data: searching DuckDuckGo...

event: token
data: The

event: token
data: 2026

event: done
data: completed

The browser listens to these events using:

const source = new EventSource("/stream?query=" + encodeURIComponent(query))

Each token is appended to the result area in real time.

This produces the same streaming experience users see in modern AI tools.

Handling Async + Threads

One of the more interesting issues appeared when the search tool ran inside a worker thread.

The error looked like this:

There is no current event loop in thread 'ThreadPoolExecutor'

This happened because the tool attempted to use the asyncio event loop from a different thread.

The fix was scheduling updates back to the main event loop using:

asyncio.run_coroutine_threadsafe(queue.put(...), loop)

This allowed the background tool execution to safely send updates to the SSE stream.

Separating the Frontend

After the backend stabilized, I moved all HTML into Jinja templates.

Instead of returning raw HTML strings, FastAPI now renders templates.

Project structure became:

app/
 ├ templates/
 │   ├ base.html
 │   └ index.html
 ├ static/
 └ main.py

FastAPI renders the page like this:

templates.TemplateResponse(
    "index.html",
    {"request": request}
)

This change cleaned up the backend significantly.

Python now focuses only on the AI pipeline, while the UI lives where it belongs.

UI Improvements

The frontend includes a small animated status indicator showing the system state:

routing question...
thinking...
searching DuckDuckGo...
generating answer...

The animation uses a lightweight CSS loader for the trailing dots.

This makes the streaming behavior clearer for users.

Observations from the Experiment

Building the system revealed a few useful lessons.

Tool quality matters

The LLM can only reason over the data it receives.

The DuckDuckGo API sometimes returns irrelevant snippets, which limits answer quality.

The model itself isn’t the bottleneck.

Streaming changes the user experience

Even if the model takes several seconds to finish, streaming tokens keeps the interface responsive.

This pattern is now standard across AI tools.

Agents need limits

Without limits, the ReAct loop may keep searching repeatedly.

Constraining iterations prevents unnecessary tool calls.

Debug visibility is critical

Watching the reasoning process in logs makes it much easier to understand why the agent produced a particular answer.

Current State

The application now includes:

FastAPI backend
SSE token streaming
LangChain agent with tool usage
DuckDuckGo search integration
Jinja template frontend
animated UI state indicators
environment-based configuration

While the project itself is small, the architecture reflects the same design patterns used by modern AI assistants that combine language models with external tools.

Source Code : https://github.com/vikaskbh/langchain-fastapi

Vikas Bhagwagar

Full Stack Software Engineer | AI Author – Credentials
Experience: 20+ Years in Software Development
Credentials: B.E. Computer, SVNIT Surat (2004)

Hand-Picked Top-Read Stories

Building a Streaming AI Search App with FastAPI, LangChain, and SSE

Instantly Push a New GitHub Repo Using One .bat File

ReferenceError: X is not a constructor – Common Instantiation Problems

Trending Tags