Designing an AI Foundation with Mastra in a Microservices Architecture

Introduction

I’m Oya from the Developer Experience & Performance team. On October 30, 2025, PLAID, Inc. announced the direction for AI‑native transformation of the KARTE product series, called “KARTE AI”(https://plaid.co.jp/news/20251030/), and released several features in beta. In April 2025, we launched the KARTE AI Project to drive development of these features. I have mainly focused on platform-level work. This article explains how we ran the project, why we chose a “centralized” architecture, and the advantages of using Mastra ↗.

About the KARTE AI Project

Before the project began, AI integration into KARTE products had barely progressed. Some products did ship AI features, but frameworks varied and we lacked a unified push to share knowledge and improve efficiency across teams. These challenges led us to launch the KARTE AI Project.

Project Goals

Based on the issues above, we set two goals:

  • Develop talent who can lead AI development.
  • Design how to build AI features in a microservices architecture.

Releasing AI‑powered features matters, but within this project, releases are a means to the end.

Project Structure Aligned to the Goals

Creating a dedicated AI team to build features would not produce the state where “each team has engineers skilled in AI Agent development.” So we adopted a dual‑assignment model. Centered on a Project Owner, Product Engineers embed AI into their respective products, Platform Engineers develop from a platform perspective, and Customer Engineers build an internal bot that answers KARTE questions. The project collaborates closely, with Platform Engineers owning AI service operations.

team2.png

Platform Engineers mainly:

  • Keep application implementations simple by shaping the environment.
  • Aggregate learnings and review effectively.
  • Fix framework bugs or suboptimal implementations upstream.
  • Proactively design workarounds for likely pitfalls.
  • Regularly bump framework versions.
  • Explore and stay current on libraries/frameworks.
  • Evaluate what new versions enable.
  • Prepare systems for operations: observability, tests, evals.

While PLAID sometimes prioritizes speed by skipping early reviews and batch‑reviewing around beta releases, for AI features we required reviews for all changes, even small ones. Since training leaders in AI development is a primary goal, code review is a highly effective lever.

Also, Platform Engineers need hands‑on experience to know where Product Engineers struggle. In parallel, we implemented multiple AI features—including chat‑UI agents—to gain practical insight.

KARTE Microservices

Before the AI system architecture, here’s KARTE’s microservices setup.

KARTE microservices are split into Web API, Internal API, and Frontend.

  • Frontend

    We build SPA JavaScript files using React.js or Vue.js and distribute them from Web API.

  • Web API

    Serves HTML, CSS, JavaScript assets for Frontend and exposes APIs called by the SPA.

  • Internal API

    APIs called from other services’ Web or Internal APIs. They assume authentication/authorization has already been handled by the calling Web API and do not implement auth themselves.

microservice2.png

Each service’s owning team develops, operates, and maintains it, keeping independence under these principles:

  • Independent deploys: build and deploy per service.
  • Independent DBs: dedicated DB per service.
  • API‑based communication: use Internal APIs between services.

Tech choices are largely up to each team. APIs are often TypeScript with MongoDB. Web API, Internal API, and Frontend live in a monorepo (e.g., pnpm workspaces), allowing free codebase dependencies—such as sharing Web API types with Frontend.

Elements of AI Features and SDKs

AI features range from simple to advanced. We mostly build AI Agents, so this article focuses on them.

AI Agents execute tasks using Tools and Memory and, as needed, perform multi‑step LLM calls.

https://mastra.ai/ja/docs/agents/overview ↗

agents-overview.webp

Tools are common across many LLM SDKs, but Memory in TypeScript is supported by frameworks like LangChain.js, Mastra, and VoltAgent.

These frameworks also include RAG, Workflows, Evals, Observability, and Web APIs. From the start, we planned to use a framework. While frameworks trade off flexibility, when you don’t yet know how to implement AI Agents or what’s possible, not adopting one carries bigger downsides. We chose Mastra, and Memory in particular is far easier than rolling our own.

Mastra

Mastra is a framework layered atop Vercel’s AI SDK. The AI SDK provides basics like model routing and tools, while Mastra adds more advanced features such as Memory and Evals.

https://mastra.ai/ja/docs/frameworks/ai-sdk ↗

mastra-ai-sdk.webp

There’s Mastra Cloud, a managed environment to deploy Mastra apps more easily, but KARTE embeds into existing apps, so we don’t use it. In this article, “Mastra server” means a KARTE‑side server with Mastra installed, not Mastra Cloud.

KARTE AI System Architecture

In our microservices architecture, we created a single KARTE AI microservice using Mastra and have other services call it, i.e., a centralized model. Alternatively, each microservice could install Mastra (decentralized). We saw these trade‑offs:

Decentralized

  • Faster if each team has engineers skilled in AI Agent development.
  • Fewer inter‑service dependencies.
    • Increased system stability.
    • Easier to roll out version/config changes in stages.

Centralized

  • Agents inside the AI system can easily interoperate (cross‑product multi‑agent).
  • Knowledge can be centralized.

AI Agent development patterns are still evolving, and not all teams have AI Agent skills. Strategically, KARTE is strengthening multi‑product elements, so we need an architecture amenable to future multi‑agent designs. We judged centralized has bigger near‑term benefits, so we consolidated AI code into one microservice.

Two key architectural questions are where Tools execute and how Storage is designed; we explain those first, then two architectures we’ve actually run.

Tools

Tools enable agents to access non‑training data or execute actions (e.g., Claude Code editing or shell execution). With Mastra, execution location leads to three types:

  • server tool

    Runs on the Mastra server. You can debug with Mastra Playground. Define server tools inside Mastra and it executes them.

  • client tool

    Runs on the client relative to the Mastra server. You implement the client‑side execution.

    With @mastra/client-js, pass a function to clientTools.execute for auto‑execution. With @ai-sdk/react useChat, run your logic via onToolCall.

  • provider tool

    Runs in the provider’s environment. For example, Vertex AI provides Google Search or Code Execution.

Choosing server vs client for custom tools depends on the app. Our custom tools typically access KARTE DB data, create KARTE configuration values (e.g., Action config), or call a separate LLM from the main agent. Provider tools fit general tasks like Python numerics or Google search.

Storage

We mainly use Memory for chat history and Evals for performance evaluation. With centralized Mastra, Storage is also usually single. Currently, aside from PostgreSQL, indexes aren’t created by default, so without indexes, performance may degrade as agents grow. Another issue is separating data when classification differs (e.g., sensitive data). Memory can switch Storage instances per Memory instance (allowing separate clusters/DBs), but Evals is configured once per Mastra instance and likely cannot be switched. If Evals can’t separate while Memory can, separation has limited value because Evals may include Memory‑like data for evaluation. So we didn’t split. If we build agents that handle highly confidential data in the future, we may reconsider separation.

Initial Architecture

We initially implemented KARTE AI as an Internal API. The Frontend could not call it directly; instead, each service’s Web API called the Mastra server.

karte-ai.png

For tool calls that read or update data, we must confirm the user’s access permissions. Authorization lives in Web APIs (requests from Frontend). Internal APIs assume auth is already done and carry that information; they do not implement auth themselves. Since tool call parameters are largely chosen by the LLM, an implementation might attempt to access data the user shouldn’t. Therefore, we planned to run such tools as client tools in the Frontend.

This architecture had several issues:

  • Dependency on the client library: using ‎@mastra/client-js server‑side isn’t well assumed, and upgrades sometimes broke streams or tool execution. It also depends on ‎@mastra/server, so bumping ‎@mastra/server past certain versions can break older ‎@mastra/client-js. With Mastra still at major 0, compatibility windows felt short. Practically, tracking Mastra latest sometimes required bumping ‎@mastra/client-js in client services and deploying together.
  • Frontend coupling: the Frontend ultimately depends on ‎@ai-sdk/reactuseChat, so teams juggle three components: ‎@mastra/server, ‎@mastra/client-js, and ‎@ai-sdk/react.
  • Redundant Web APIs: many Web APIs wrapping Mastra server calls looked similar across services.

To address this, we adopted the following approach.

Frontend Direct Requests to the Mastra Server

front-mastra.png

In this approach, the Frontend calls the Mastra server directly and drops dependence on ‎@mastra/client-js, reducing minor bugs and debugging costs. After major changes in the Vercel AI SDK, this architecture let us focus on migrating Frontend ‎useChat. It also eliminates the need to build redundant Web APIs.

Previously, Frontend could only call its own service’s Web API. We removed that rule here. While it may introduce some confusion, the benefits outweighed the downsides. That rule exists to avoid other services depending on a Web API built for one service’s Frontend, which harms independence. KARTE AI is designed to be called by other services, so independence concerns don’t apply.

Advantages of Mastra and Usage Examples

We won’t do a detailed comparison, but initial reasons were: TypeScript‑native framework, released earlier than VoltAgent, and stronger docs and momentum than LangChain.js. In practice, updates are fast with frequent new features, and questions on Discord and PRs get quick reactions—great speed and flexibility, crucial in fast‑moving AI. The codebase is clear and easy to develop against; I’ve had 24 PRs merged. If rapid updates worry you, Mastra v1 beta was just announced, so stability should improve.

Earlier this year in May, Mastra’s CTO visited Japan and our office; we discussed PLAID’s AI, Mastra’s roadmap, and shared our requests.

mastra-plaid.png

Mastra is a strong choice for most TypeScript AI Agent development. I’ll focus on two especially valuable features: Memory and Stream.

Memory

Building Mastra‑like Memory from scratch would have been unrealistic. Memory is essentially chat history, but with options essential to AI Agents. All AI Agents available in the KARTE admin UI include Memory.

Resources and Threads

A thread contains multiple messages; passing past thread messages to the LLM as context reduces prompt writing and prevents repeated mistakes. A resource groups multiple threads. In KARTE, we restrict chat messages viewable per admin user. Even within the same tenant, users cannot access others’ instructions to KARTE AI. We include the authenticated user’s ID as a prefix in resource IDs to prevent viewing others’ histories.

Processor

Memory processors modify the message list before adding it to the agent’s context window and sending to the LLM. This helps manage context size, filter content, and optimize performance. KARTE agents use a

TokenLimiter

Processor that limits how far back to include past messages so total tokens stay within the LLM’s limit.

Semantic Recall

This retrieves semantically similar past history to include as context for a user’s question or instruction. It supports cross‑thread recall, searching across accessible threads. Technically it uses RAG. We tried it once, but a MongoDB bug at the time (quickly fixed) kept us from using it. Having such features available and ready in Mastra is a big plus for future needs.

Message Conversion

Vercel AI SDK moved from v4 to v5 with significant message format changes. We needed v4‑saved chat messages to work under v5. Mastra handles this compatibility internally, making the migration smooth.

Stream

The next evolution of Mastra streaming ↗

Mastra implements its own streaming (“Mastra streaming”). With complex setups like multi‑agent or workflows, Frontend needs good visualization. Previously, an agent’s Stream method returned streams defined by the AI SDK stream protocol, which couldn’t stream outputs from LLM calls inside workflows or inside tools. Mastra streaming can handle nested LLM outputs as streams. In KARTE’s admin UI, our multi‑agent setup calls agents from tools, and we stream tool outputs.

Mastra converts Mastra streaming into types compatible with Vercel AI SDK v5’s stream protocol. While the protocol lacks parts for streaming tool outputs, Mastra achieves compatibility using Custom Data.Streaming Custom Data ↗
Developers can define custom data types, which improves DX.

Mastra will likely support AG‑UI and is developing @mastra/react, making UIs that leverage Mastra’s features easier to build.

Conclusion

To continuously develop AI features in KARTE’s microservices environment, we designed and operated an AI foundation using Mastra in a centralized model first. Centering the foundation on Mastra brings major advantages: rapid, continuous feature additions and usability improvements without building everything ourselves.

The balance between centralized and decentralized will evolve. As AI development skills spread across the Product org and we learn to build multi‑agent networks quickly and securely across services, we’ll gradually decentralize. KARTE’s microservices should also evolve to ease AI development—for example, implementing auth in Internal APIs or offering internal MCP.