Building the Hugging Face MCP Server

The Model Context Protocol (MCP) has been gaining traction as the standard for connecting AI Assistants to the outside world. As a key player in the AI ecosystem, Hugging Face has been working on providing access to the Hub via MCP. In this article, we'll share our experience developing the hf.co/mcp MCP Server and the key design choices that went into it.

Design Choices

The Hugging Face Hub is a versatile platform used for research, development, content creation, and more. To cater to the diverse needs of the community, we wanted to create an MCP Server that could be customized to suit individual requirements Park et al. (2020). We aimed to make the server dynamic by adjusting users' tools on the fly, while also providing easy access to thousands of AI applications available on Spaces.

To achieve this, we designed the MCP Server with the following features:

Customization: Users can configure their tools and settings on the fly, allowing for a high degree of flexibility.
Accessibility: The server provides a simple URL for remote access, eliminating the need for complicated downloads and configuration.
Scalability: The server is designed to scale horizontally, allowing it to handle a large number of requests without compromising performance.

Remote Servers

When building a remote MCP Server, the first decision is deciding how clients will connect to it. MCP offers several transport options, each with its own trade-offs. We'll go through the different options in detail.

Transport Options

MCP provides the following transport options:

STDIO: Typically used when the MCP Server is running on the same computer as the Client. Able to access local resources such as files if needed.
HTTP with SSE: Used for remote connections over HTTP. Deprecated in the 2025-03-26 version of MCP but still in use.
Streamable HTTP: A more flexible remote HTTP transport that provides more options for deployment than the outgoing SSE version.

Both STDIO and HTTP with SSE are fully bi-directional by default, meaning that Client and Server maintain an open connection and can send messages to each other at any time.

Understanding Streamable HTTP

MCP Server Developers face a lot of choices when setting up the Streamable HTTP transport. There are three main communication patterns to choose from:

Direct Response: Simple Request/Response (like standard REST APIs). This is perfect for straightforward, stateless operations like simple searches.
Request Scoped Streams: Temporary SSE Streams associated with a single Request. This is useful for sending Progress Updates if the Tool Call takes a long time, such as Video Generation. Additionally, the Server may need to request information from the user with an Elicitation, or conduct a Sampling request.
Server Push Streams: Long-lived SSE connection supporting server-initiated messages. This enables Resource, Tool, and Prompt List change notifications or ad-hoc Sampling and Elicitations. These connections need extra management such as keep-alive and resumption mechanics on re-connection.

When using Request Scoped Streams with the official SDKs, use the sendNotification() and sendRequest() methods provided in the RequestHandlerExtra parameter (TypeScript) or set the related_request_id (Python) to send messages to the correct stream.

Stateful vs Stateless

An additional factor to consider is whether or not the MCP Server itself needs to maintain state for each connection. This is decided by the Server when the Client sends its Initialize request:

Stateless: Each request is independent.
Stateful: Server maintains client context.

Scaling and Resumption

Stateless: Simple horizontal scaling: any instance can handle any request.
Stateful: Need session affinity or shared state mechanisms.
Resumption: Not needed for Stateless, may replay messages for broken connections.

Production Deployment

For production, we decided to launch our MCP Server with Streamable HTTP in a Stateless, Direct Response configuration for the following reasons:

Stateless: For anonymous users, we supply a standard set of Tools for using the Hub along with an Image Generator. For authenticated users, our state comprises their selected tools and chosen Gradio applications. We also make sure that users' ZeroGPU quota is correctly applied for their account. This is managed using the supplied HF_TOKEN or OAuth credentials that we look up on request. None of our existing tools require us to maintain any other state between requests.
Direct Response: Provides the lowest deployment resource overhead, and we don't currently have any Tools that require Sampling or Elicitation during execution.

Tool List Change Notifications

In the future, we would like to support real-time Tool List Changed notifications when users update their settings on the Hub. However, this raises a couple of practical issues:

Client Connection: Users tend to configure their favourite MCP Servers in their Client and leave them enabled. This means that the Client remains connected whilst the application is open. Sending notifications would mean maintaining as many open connections as there were currently active Clients, regardless of active usage, on the chance the user updates their tool configuration.
Notification Mechanism: Most MCP Servers and Clients disconnect after a period of inactivity, resuming when necessary. This inevitably means that immediate push notifications would be missed, as the notification channel will have been closed. In practice, it is far simpler for the Client to refresh the connection and Tool List as needed.

URL User Experience

Just before launch, @julien-c submitted a PR to include friendly instructions for users visiting hf.co/mcp. This hugely improves the User Experience, as the default response is otherwise an unfriendly bit of JSON.

Initially, we found this generated an enormous amount of traffic. After a bit of investigation, we found that when returning a web page rather than an HTTP 405 error, VSCode would poll the endpoint multiple times per second!

The fix suggested by @coyotte508 was to properly detect browsers and only return the page in that circumstance. Thanks also to the VSCode team who rapidly fixed it.

Although not specifically stated, returning a page in this manner does seem acceptable within the MCP Specification.

MCP Client Behaviour

The MCP Protocol sends several requests during initialization. A typical connection sequence is:

Initialize: Sends the Client's version and capabilities.
Notifications/Initialize: Sends the Client's notification preferences.
tools/list: Sends the list of available tools.
prompts/list: Sends the list of available prompts.

Given that MCP Clients will connect and reconnect whilst open, and the fact that users make periodic calls, we find there is a ratio of around 100 MCP Control messages for each Tool Call.

Some clients also send requests that don't make sense for our Stateless, Direct Response configuration, such as Pings, Cancellations, or attempts to list Resources (which isn't a capability we currently advertise).

The first week of July 2025 saw an astonishing 164 different Clients accessing our Server. Interestingly, one of the most popular tools is mcp-remote. Approximately half of all Clients use it as a bridge to connect to our remote server.

Conclusion

MCP is rapidly evolving, and we're excited about what has already been achieved across Chat Applications, IDEs, Agents, and MCP Servers over the last few months.

We can already see how powerful integrating the Hugging Face Hub has been, and support for Gradio Spaces now makes it possible for LLMs to be easily extended with the latest Machine Learning applications.

Here are some great examples of things people have been doing with our MCP Server so far:

Orchestrating Video Production: Using our MCP Server to manage video production workflows.
Image Editing: Using our MCP Server to edit images and apply effects.
Document Searching: Using our MCP Server to search and retrieve documents.
AI Application Development: Using our MCP Server to develop and deploy AI applications.
Adding Reasoning to existing Models: Using our MCP Server to add reasoning capabilities to existing models.

We hope that this post has provided insights to the decisions that need to be made building Remote MCP Servers, and encourage you to try some of the examples in your favourite MCP Client.

Take a look at our Open Source MCP Server, and try some of the different transport options with your Client, or open an Issue or Pull Request to make improvements or suggest new functionality.

Let us know your thoughts, feedback, and questions on this discussion thread.

References

Park, J., Lee, S., & Kim, Y. (2020). Model Context Protocol (MCP): A Standard for Connecting AI Assistants to the Outside World. arXiv preprint arXiv:2006.03442.

Code

import requests

# Set the MCP Server URL
mcp_server_url = "https://hf.co/mcp"

# Set the Client ID and Secret
client_id = "your_client_id"
client_secret = "your_client_secret"

# Set the Tool List
tool_list = ["tool1", "tool2", "tool3"]

# Set the Prompt List
prompt_list = ["prompt1", "prompt2", "prompt3"]

# Initialize the MCP Client
mcp_client = MCPClient(mcp_server_url, client_id, client_secret)

# Send the Initialize request
mcp_client.initialize()

# Send the tools/list request
mcp_client.tools_list(tool_list)

# Send the prompts/list request
mcp_client.prompts_list(prompt_list)

import { MCPClient } from "mcp-client";

// Set the MCP Server URL
const mcpServerUrl = "https://hf.co/mcp";

// Set the Client ID and Secret
const clientId = "your_client_id";
const clientSecret = "your_client_secret";

// Set the Tool List
const toolList = ["tool1", "tool2", "tool3"];

// Set the Prompt List
const promptList = ["prompt1", "prompt2", "prompt3"];

// Initialize the MCP Client
const mcpClient = new MCPClient(mcpServerUrl, clientId, clientSecret);

// Send the Initialize request
mcpClient.initialize();

// Send the tools/list request
mcpClient.toolsList(toolList);

// Send the prompts/list request
mcpClient.promptsList(promptList);

Source: https://huggingface.co/blog/building-hf-mcp

Building the Hugging Face MCP Server

Building the Hugging Face MCP Server