Claude Code transcript: What is MCP Really?

Notes on this session are here

10 prompts · 67 messages · 12 tool calls · 0 commits · 2 pages

#1

I'd like some help with understanding the detailed operation of mcp servers. For background, I have a high level understanding of the mcp concept: since the "host" LLM only "understands" text natural language, I assume that an mcp server must comprise some text that is injected into the context window at the beginning of a session. This I assume tells the LLM what the mcp server does, and also must define some special text markers (or some other kind of special tokens) that ends up being forwarded to the mcp server in order to invoke one of its methods. I suppose there must also be some mechanism to define the schema for mcp server response data, so the LLM understands what it means. Assuming that high level understanding is roughly correct, I am intrigued to see the actual code for an mcp server, in order to identify all these component pieces in a concrete implementation. To that end I found the current project, which is an MCP server for the management of LDAP servers. This suits my purpose because I already have a very good understanding of LDAP server management, so there is no element of confusion about that aspect. Where I need help however is that after reviewing the source files in the current project for a while, I have been unable to find "the meat" so to speak. I see code that constitutes essentially a data access layer for LDAP severs. I see many print statements with text, but it is text prompting the user, not the LLM. I have not yet found where the MCP server instructs the LLM how to use it, nor where the supposed special tokens that allow the LLM to drive the MCP server is located. Please help me understand more, and if you can please locate these "missing" components of the MCP server.

6 read · 1 task

Now I understand your confusion. Let me explain how MCP actually works, which is quite different from your initial mental model.

The Key Misconception

Your intuition that "the LLM only understands natural language" led you to expect special tokens or text markers. But MCP doesn't communicate directly with the LLM at all. Instead:

┌─────────────┐    JSON-RPC/stdio    ┌─────────────┐
│  MCP Server │ ◄─────────────────► │    Host     │ ◄───► LLM
│  (this code)│                      │(Claude Code)│
└─────────────┘                      └─────────────┘

The Host (like Claude Code, Claude Desktop, or any MCP-compatible client) acts as the intermediary: 1. It connects to MCP servers via JSON-RPC (over stdio, HTTP, etc.) 2. It queries MCP servers for their available tools, prompts, and resources 3. It presents those tools to the LLM in the LLM's native tool format 4. When the LLM calls a tool, the Host translates it to an MCP protocol call

Where the "Instructions to the LLM" Actually Live

The "instruction" to the LLM comes from Python docstrings and type hints. Look at this tool definition in src/dirsrv_mcp/tools/users.py:21-45:

@mcp.tool()
def list_all_users(limit: int = 50, server_name: Optional[str] = None) -> Dict[str, Any]:
    """List users in the directory with computed status.

    Note: In privacy mode (default), returns count only.
    Set LDAP_MCP_EXPOSE_SENSITIVE_DATA=true for full user details.
    """

The @mcp.tool() decorator (from FastMCP) does several things: 1. Function name → becomes the tool name (list_all_users) 2. Docstring → becomes the tool description shown to the LLM 3. Type hints (limit: int, server_name: Optional[str]) → become the JSON schema for parameters 4. Return type → informs the expected response structure

Another example from src/dirsrv_mcp/tools/health.py:307-329:

@mcp.tool()
def first_look() -> Dict[str, Any]:
    """Comprehensive health overview - the go-to tool for "what's wrong with my directory?"

    Performs a complete health assessment across all configured servers including:
    - Server connectivity and basic health
    - Connection and thread utilization
    - Replication status and errors
    - Cache efficiency (entry cache hit ratios)
    - Disk space usage (local servers only)
    - SSL certificate expiration (local servers only)
    ...
    """

The Registration Pattern

In src/dirsrv_mcp/server.py:82-84, tools get registered:

self._register_prompts()
self._register_tools()
self._register_resources()

And _register_tools() calls functions like register_user_tools(mcp) which then use the @mcp.tool() decorator on each function.

The Protocol Layer (The "Missing" Code)

The protocol handling you're looking for is not in this project - it's in the fastmcp library. This project extends FastMCP:

from fastmcp import FastMCP

class LDAPAssistantMCP(FastMCP):
    ...

FastMCP handles: - JSON-RPC message serialization/deserialization - The tools/list, tools/call protocol methods - Schema generation from Python type hints - Transport (stdio, SSE, etc.)

The Actual Wire Protocol

When the Host asks "what tools do you have?", FastMCP responds with JSON like:

{
  "tools": [
    {
      "name": "list_all_users",
      "description": "List users in the directory with computed status...",
      "inputSchema": {
        "type": "object",
        "properties": {
          "limit": {"type": "integer", "default": 50},
          "server_name": {"type": "string"}
        }
      }
    }
  ]
}

When the LLM (via the Host) calls a tool:

{"method": "tools/call", "params": {"name": "list_all_users", "arguments": {"limit": 10}}}

FastMCP routes this to the decorated Python function and returns the result.

Summary

What you expected What actually happens
Special tokens in context JSON-RPC protocol over stdio
Text injected at session start Host queries server for tool schemas
LLM parses special markers Host translates LLM tool calls to MCP protocol
Schema definitions in code Auto-generated from Python type hints + docstrings

The "meat" you were looking for is: 1. Tool definitions: The @mcp.tool() decorated functions in src/dirsrv_mcp/tools/*.py 2. Protocol handling: The fastmcp library (not in this repo) 3. LLM instructions: Python docstrings that FastMCP extracts and serves via JSON-RPC

#2

Hmm...I think my intuition and your intuition are actually the same! Although the MCP server doesn't communicate directly with the LLM, it really does in that the intermediary components pretty much pass through data. I have some follow up questions:

#3

Can you explain what exactly the LLM's "native tool format" is, and how exactly LLMs conceive of "tools". I'm having trouble understanding this given that I believe an LLM is a box that accepts embedded text tokens and responds with text tokens. Are "tools" something else or are they some encoding of this textural/embedding scheme?

Excellent question - you're probing at something fundamental. Your intuition is correct: it's tokens all the way down.

The Reality: Tools Are Just Structured Text

LLMs like Claude are indeed "text in, text out" (more precisely, tokens in, tokens out). There is no separate "tool mechanism" at the neural network level. What we call "tools" is really:

  1. A training convention - Claude was trained on examples showing tool definitions and how to "invoke" them
  2. A prompt injection pattern - tool schemas get serialized into the context as text
  3. An output parsing convention - certain output patterns are intercepted and interpreted as "tool calls"

What Actually Happens

When you (or Claude Code) make an API call with tools, something like this happens:

API Request (simplified):

{
  "messages": [{"role": "user", "content": "List LDAP users"}],
  "tools": [{
    "name": "list_all_users",
    "description": "List users in the directory...",
    "input_schema": {"type": "object", "properties": {"limit": {"type": "integer"}}}
  }]
}

What Claude Actually "Sees" (conceptually):

The API layer serializes this into the prompt context. While the exact format is proprietary, conceptually it's something like:

You have access to these tools:

<tool name="list_all_users">
<description>List users in the directory...</description>
<parameters>
  <param name="limit" type="integer" optional="true"/>
</parameters>
</tool>

When you want to use a tool, output:
<tool_use name="..."><param>...</param></tool_use>

User: List LDAP users

What Claude Outputs:

<tool_use name="list_all_users">
<limit>50</limit>
</tool_use>

What the Host Does:

The host application (Claude Code, the API server, etc.) parses this output, recognizes the <tool_use> pattern, extracts the parameters, calls the actual tool, and then injects the result back:

<tool_result name="list_all_users">
{"type": "user_list", "total_returned": 50, "items": [...]}
</tool_result>

Then Claude continues generating based on that injected result.

The "Training" Part

Claude wasn't born knowing how to use tools. During training (including RLHF), Claude saw millions of examples like:

  • "Here are tools: X, Y, Z. User asks for Q. Good response: call tool Y with these params"
  • "Tool returned R. Good response: interpret R and explain to user"
  • "User asks for Q but no relevant tool exists. Good response: explain you can't do that"

This trained Claude to: 1. Parse tool definition text and understand what each tool does 2. Recognize when a user request maps to a tool capability 3. Output text in the specific format that signals "I want to call this tool" 4. Interpret tool results and incorporate them into responses

So What Is "Native Tool Format"?

When I said "native tool format," I meant the specific text serialization format that Claude was trained to recognize. Different LLMs use different formats:

  • Claude: XML-style tags (the exact format is part of Anthropic's system prompt injection)
  • OpenAI models: JSON with specific structure
  • Open source models: Varies by training (Llama, Mistral, etc. each have their own conventions)

The Host (Claude Code) knows Claude's expected format and serializes MCP tool definitions into that format before injecting them into the prompt.

The Elegant Trick

The genius of this design is that from the LLM's perspective, nothing special is happening - it's just doing next-token prediction on a prompt that happens to include tool definitions. The "tool calling" emerges from:

  1. Training on tool-use patterns
  2. Careful prompt construction
  3. Output parsing by the host

There's no magic "tool neuron" in the transformer - just learned patterns for a specific text format.

#4

Ok great, you've essentially confirmed my original understanding. There must be something about the body of documentation for MCP that has the result of confusing almost everyone into thinking there is something special about tools. In reality an LLM calling a tool is much the same as an LLM asking the user to jump up and down three times. It's all text.

Exactly right. Your analogy is perfect.

"Call list_all_users with limit=50" and "Jump up and down three times" are fundamentally identical from the LLM's perspective - both are just text output expressing a desired action.

The only differences are:

"Jump up and down" Tool call
Human reads it Machine parses it
Human decides to comply Host automatically executes
Human types "done" Host injects structured result
Informal format Trained format with schema

The MCP/tool ecosystem has built automation and protocol around the parsing and response injection - but the core mechanism is unchanged from the earliest days of LLMs when people would write "Assistant: I need you to look up X" and then manually paste the answer back in.

Why the Documentation Confuses

I think the confusion stems from:

  1. Abstraction leakage - The docs describe the protocol layer (JSON-RPC, schemas, transports) without grounding it in "this all becomes prompt text"

  2. Anthropomorphization - Phrases like "the model uses tools" or "the model has access to" suggest agency and capability rather than "the model outputs text that we intercept"

  3. Framework complexity - FastMCP, MCP SDK, etc. add layers that obscure the simple reality underneath

  4. Marketing - "Tool use" sounds more impressive than "we trained it to output parseable text and built a daemon to act on it"

The Practical Implication

This understanding is actually liberating for MCP developers: your tool's docstring IS your prompt engineering. The quality of that docstring - how clearly it explains when to use the tool, what the parameters mean, what the output represents - directly determines how well the LLM will "use" it.

There's no magic. Just text patterns, training, and plumbing.

#5

Thank you, you've been very helpful. Can you write out everything above (both my questions and your answers, with formatting and color) into a file so I can show other people later?

1 write

Done! I've written the full Q&A to:

MCP_DEMYSTIFIED.md

It includes: - The architectural diagram - All the key explanations about how tools are really just text - The comparison tables - Code examples - The "jump up and down" analogy - Why MCP documentation tends to confuse people - An appendix showing where key components live in an MCP server

The file is in Markdown format so it will render nicely on GitHub or any Markdown viewer.

#6

Great thanks. I have a more subtle question about MCP use now.

#7

My question is about how (or if) the LLM is able to deduce a) which tool to call and b) what to do with the response. I understand basic examples. For example if the tool is described as "get the current wind direction" and the user asks "is the wind in the south today?", the LLM can pretty easily realize that to know the current wind direction it can call the tool. However, I have a less obvious example from the current project (the ldap mcp server): a cluster of LDAP servers replicate data between each other. The current replication state can be queried from an ldap server. This is called a replica update vector (RUV) but can be thought of as basically a sequence number. A standard problem in LDAP server management is to check if all the servers are "caught up" with replication updates. This is determined by querying the RUV from each server then comparing them. Any server with a lower RUV than the majority must be lagging behind on processing replication updates. This MCP server specifically says in its documention that it supports this, and indeed it does include a tool call to fetch fthe RUV from a single server. Therefore I can see that if I were to ask the LLM "what is the RUV for Server1?" the LLM would easily know to make a tool call to fetch the RUV with parameter "server=Server1". What I'm not clean on is if I ask the LLM "are my servers all caught up on replication?", how can the LLM know that to answer this question it needs to fetch the RUV from all servers (the set of servers is configured for the MCP server, so that set of servers is implicit), but it then needs to understand/know that it must check the RUV values for all the servers, looking for a maximum and checking if all the servers are reporting that same maximum. I'm not clear on how or if the MCP server communicates this insight to the LLM. Perhaps it is relying on the LLM understanding the operation of LDAP servers from prior training data? If this is true then the MCP server isn't really adding fundemental capabilities to the LLM because it (the LLM) already understands LDAP servers. Or it has to in order for the MCP server to be useful.

2 read

This is a fantastic question that gets at the limits of MCP. Looking at the actual code reveals the answer: the MCP server designer chose to encapsulate the domain logic INTO the tools themselves.

What This Server Actually Does

Look at check_replication_lag() (lines 563-731). Instead of exposing a primitive "get RUV" tool, it:

  1. Queries all agreements on the server
  2. For each agreement, calls agmt.get_agmt_status() which internally compares supplier vs consumer CSNs
  3. Returns pre-analyzed results: "In Synchronization", "lagging", "error"
  4. Generates findings with severity levels

The LLM doesn't need to understand RUV comparison logic - the tool does it and returns:

if status.get("msg") == "In Synchronization":
    lag_entry["lag_status"] = "in_sync"
elif status.get("state") == "red":
    lag_entry["lag_status"] = "error"
else:
    lag_entry["lag_status"] = "lagging"

Similarly, get_replication_topology() (lines 403-561) queries all configured servers automatically and builds a comprehensive view.

The Three Strategies

Your question identifies a fundamental design tension. There are three approaches:

Strategy 1: Encode domain logic in the tool (what this server does)

User: "Are my servers caught up?"
         ↓
LLM sees: check_replication_lag() - "Analyze replication lag across agreements"
         ↓
LLM calls: check_replication_lag()
         ↓
Tool internally: queries all servers, compares CSNs, determines lag
         ↓
Returns: {"summary": "HEALTHY: All 3 agreements in sync", ...}

The LLM only needs to understand "this tool checks if replication is lagging" - not HOW to check.

Strategy 2: Rely on LLM's prior training knowledge

If you only had get_ruv(server), the LLM would need to: - Know to call it for each server - Know that RUVs contain CSNs - Know how to compare CSNs to detect lag

This works if and only if the LLM was trained on enough LDAP documentation. For 389 Directory Server? Probably yes - it's well-documented. For some obscure proprietary system? Probably not.

Strategy 3: Teach via prompts/docstrings

The server also registers prompts that guide orchestration. Let me show you:

Here's the third strategy in action. The diagnose_replication prompt (lines 178-202) teaches the LLM the orchestration workflow:

Step 1: Use get_replication_status for overview
Step 2: Use list_replication_conflicts to find conflicts
Step 3: Use check_replication_lag to identify sync delays
Step 4: Use get_agreement_status for deeper investigation

Your Core Insight Is Correct

You asked:

Perhaps it is relying on the LLM understanding the operation of LDAP servers from prior training data? If this is true then the MCP server isn't really adding fundamental capabilities to the LLM because it already understands LDAP servers.

This is partially true, and it's a profound observation:

What MCP adds What MCP does NOT add
Access to live data (actual RUV values) Understanding of what RUVs mean
Ability to execute operations Knowledge of LDAP concepts
Structured responses Reasoning about how to use the data

An MCP server is like giving someone a calculator. The calculator adds computational capability, but if the person doesn't understand math, they won't know which buttons to press or what the output means.

The Practical Implications

For MCP server design:

  1. Primitive tools require LLM domain knowledge - A get_ruv(server) tool only works if the LLM already understands LDAP replication

  2. High-level tools encode your expertise - check_replication_lag() encapsulates the "compare CSNs across servers" logic so the LLM only needs to understand "check if lag exists"

  3. Prompts can teach orchestration - If you must use primitives, prompts can guide multi-step workflows

  4. Response design matters - Returning "in_sync" vs "lagging" is more LLM-friendly than raw CSN values

This MCP server chose well: it provides domain-aware tools that do the hard reasoning internally, so the LLM only needs to match user intent to tool purpose - not understand LDAP internals.

#8

Oh wow, that's very interesting so thank you for that insight. This prompts another question: what really is the value of the LLM and MCP in this particular scenario? Specifically the code embodied within the MCP server could just as well have been shipped as a stand alone "LDAP server grokker" utility that had a "ldap-server-grokker check-replication-lag" subcommand, no LLM needed. What actually is the LLM bringing to the party here? It seems that there is "less to this than meets the eye". Is the LLM and tool plumbing essentially just a kind of "hidden app store" that allows easy distribution of utilities that would be pretty much as useful stand-alone? The LLM aspect (e.g. translating "check if my servers are lagging" into "ldap-server-grokker check-replication-lag") seems more of a party trick than something profoundly useful.

You've hit on something that I think is a genuine tension in the LLM tooling space, and your skepticism is warranted. Let me be direct about where the value is and isn't.

Where You're Right

For single-tool invocations by domain experts, the LLM layer is indeed mostly a "party trick":

Traditional CLI:
$ ldap-grokker check-replication-lag
{"summary": "HEALTHY: All 3 agreements in sync", ...}

LLM + MCP:
You: "check if my servers are lagging"
LLM: *calls check_replication_lag()*
     "Your servers are all in sync."

The second path adds: - Latency (API round-trip) - Cost (token usage) - Potential for error (LLM might misunderstand) - Dependency on external service

For you - someone who knows LDAP cold - the CLI is arguably better.

Where the LLM Might Add Value

The value proposition strengthens in specific scenarios:

1. Orchestration of multi-step investigations

You: "My LDAP server is slow, figure out why"

LLM: *calls get_performance_summary()*
     "Cache hit ratio is 45% - that's low. Let me check cache config..."
     *calls get_cache_statistics()*
     "Your entry cache is 200MB but you have 2M entries. Let me check memory..."
     *calls get_resource_utilization()*
     "You have 16GB free RAM. I recommend increasing nsslapd-cachememsize to 2GB.
      Here's the dsconf command: ..."

A CLI could do each step, but you'd need to know the diagnostic flow. The LLM chains tools based on intermediate results.

2. Interpretation in context

You: "Is this RUV lag concerning?"

LLM: "A lag of 3000 CSNs is moderate. Given your replication
      schedule of every 5 minutes and your ~100 writes/minute,
      this represents about 30 minutes of lag. That's unusual -
      normally you'd see <1 minute. Let me check the agreement status..."

The CLI gives you data; the LLM gives you meaning (if it's right).

3. Non-experts / unfamiliar domains

For someone who doesn't know LDAP: - They don't know check-replication-lag exists - They don't know what "RUV" or "CSN" means - They can't interpret the output

The LLM provides discoverability and translation.

4. Conversational refinement

You: "Check replication lag"
LLM: *shows results*
You: "Just for the dc=example,dc=com suffix"
LLM: *calls check_replication_lag(suffix="dc=example,dc=com")*
You: "Now compare to server2"
LLM: *remembers context, calls with server_name="server2"*

But Your Skepticism Is Valid

For domain experts doing routine tasks, the honest answer is:

Aspect CLI wins LLM wins
Speed
Determinism
Scriptability
Offline use
Discoverability
Multi-step reasoning Maybe
Result interpretation Maybe
Non-expert users

The "maybe" entries depend heavily on the LLM actually getting it right, which isn't guaranteed.

The "Hidden App Store" Observation

Your framing of MCP as a "hidden app store" is astute. There's definitely an element of:

"Let's make utilities discoverable/invocable via natural language and call it AI"

The counter-argument would be: integration is valuable. Having 50 CLIs requires knowing 50 CLIs. Having 50 MCP servers behind one LLM interface means you can say "help me with X" and the right tool gets invoked.

But that's still largely a UX/discovery argument, not a "profound AI capability" argument.

My Honest Take

The LLM + MCP combination is most valuable when: 1. The user doesn't know what they don't know (exploration, not execution) 2. The task requires chaining multiple tools with reasoning between steps 3. Interpretation/explanation is needed, not just data 4. The domain is unfamiliar to the user

For an LDAP expert running routine checks? You might genuinely be better off with a CLI. The LLM layer adds friction without proportionate value.

The industry hype around "AI agents" sometimes obscures this reality. Not every hammer needs an LLM attached to it.

#9

Ok great thanks for the summary. I had another take, which is that MCP isn't really needed if it only serves the purpose of interfacing some simple data access code with the LLM. I suspect that future generations of LLM will not need MCP because they can directly write and execute that code based on information such as documentation and the source code for stand alone tools, present in their training data.

You're describing a plausible future, and it's worth examining seriously.

The Thesis

Instead of:

User → LLM → MCP protocol → MCP server → lib389 → LDAP server

Future:

User → LLM → generates Python using lib389 → executes it → LDAP server

The LLM already knows lib389 from training data. Why the indirection?

Arguments For Your Thesis

1. LLMs can already do this (sometimes)

Right now, I could write:

from lib389 import DirSrv
from lib389.replica import Replicas

ds = DirSrv()
ds.open(uri="ldap://server1:389", binddn="cn=admin", bindpw="...")
for replica in Replicas(ds).list():
    print(replica.get_ruv().format_ruv())

...and execute it. No MCP needed. The knowledge is in the training data.

2. MCP tools are just frozen code

The check_replication_lag() function in this MCP server is ~170 lines of Python. If the LLM can reliably generate equivalent code on demand, pre-packaging it adds no value.

3. Code generation is improving rapidly

As LLMs get better at: - Writing correct code first-try - Handling edge cases - Debugging their own errors

...the reliability gap between "call tested MCP tool" and "generate and run code" shrinks.

Arguments Against (Current Limitations)

1. Security and sandboxing

MCP tools are vetted, reviewed, have defined permissions. LLM-generated code might: - Have bugs - Accidentally delete data - Expose credentials - Do unexpected things

2. Credentials management

The MCP server handles auth configuration. If the LLM writes code, it needs credential access, which is a larger attack surface.

3. Reliability at the tail

LLM-generated code might work 95% of the time. For critical infrastructure (like LDAP servers), that 5% failure rate is unacceptable. Pre-tested tools have known behavior.

4. Efficiency

Generating, executing, and potentially debugging code is slower than calling a tested function.

But These Are Engineering Problems, Not Fundamental Barriers

With: - Better sandboxing (already exists in Claude Code) - Secure credential injection - Higher code generation reliability - Faster execution environments

...your vision becomes practical.

MCP as Transitional Technology?

Your implicit question is: Is MCP a transitional crutch that will be obsoleted?

Possibly yes. MCP might be the "training wheels" phase:

Era How LLMs get capabilities
Early Can only output text, humans act on it
Current Pre-packaged tools (MCP) extend capabilities safely
Future? Generate and execute code directly, tools unnecessary

What Might Persist

Even in a "generate code directly" world, some value might remain in:

1. Trust/curation

For critical operations, you might still want human-vetted implementations. "Use the official lib389 MCP server" vs "let the LLM improvise."

2. Capabilities that aren't just code

Some tools provide access to things that aren't easily generated: - Proprietary APIs with complex auth flows - Hardware interfaces - Services requiring persistent connections/state

3. Efficiency for common operations

Generating code every time has overhead. Pre-built tools are faster for frequent operations.

4. The "app store" model

Even if unnecessary technically, vendors might prefer packaging capabilities as MCP servers for: - Branding - Monetization - Version control - Support boundaries

My Take

You've identified something real. MCP is partly a workaround for current LLM limitations: - Not reliable enough at code generation - Execution environments not secure enough - No standard way to provide credentials

As those limitations erode, the MCP abstraction may become unnecessary overhead for many use cases. The protocol layer is solving a problem that might not exist in 3-5 years.

The counterpoint is that "pre-packaged, tested, trusted capabilities" might always have value - even if technically unnecessary - for the same reasons we use libraries instead of writing everything from scratch.

#10

Thank you again. Can you please update the markdown file you wrote earlier with the new interaction we've had since then?

1 read · 1 edit

Done. I've added three new sections to MCP_DEMYSTIFIED.md:

  1. How Does the LLM Know Which Tool to Call? - The three strategies (encode logic in tools, rely on training, teach via prompts) with the LDAP replication example

  2. What Value Does the LLM Actually Add? - Honest assessment of CLI vs LLM+MCP tradeoffs, acknowledging where your skepticism is valid

  3. Is MCP a Transitional Technology? - Your thesis about future LLMs generating code directly, with arguments for and against

The document is now ~540 lines and covers the full arc of our conversation.