Documentation Index
Fetch the complete documentation index at: https://wb-21fd5541-anish-docs-mcp-server-rework.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The W&B Kubernetes Operator is the recommended way to deploy W&B Server on Kubernetes (cloud or on-premises). For an overview of the operator, why W&B uses it, and how configuration hierarchy works, see Self-Managed.Before you begin
Before deploying W&B with the Kubernetes Operator, ensure your infrastructure meets all requirements:- Review infrastructure requirements: See the Self-Managed infrastructure requirements page for comprehensive details on:
- Software version requirements (Kubernetes, MySQL, Redis, Helm)
- Hardware requirements (CPU architecture, sizing recommendations)
- Kubernetes cluster configuration
- Networking, SSL/TLS, and DNS requirements
- Obtain a W&B Server license: See the License section on the Requirements page.
- Provision external services: Set up MySQL, Redis, and object storage before deployment.
MySQL Database
W&B requires an external MySQL database. For production, W&B strongly recommends using managed database services: Managed database services provide automated backups, monitoring, high availability, patching, and reduce operational overhead. See the reference architecture for complete MySQL requirements, including sizing recommendations and configuration parameters. For database creation SQL, see the bare-metal guide. For questions about your deployment’s database configuration, contact support or your AISE. For complete MySQL setup instructions including configuration parameters and database creation, see the MySQL section in the requirements page.Redis
W&B depends on a single-node Redis 7.x deployment used by W&B’s components for job queuing and data caching. For convenience during testing and development of proofs of concept, W&B Self-Managed includes a local Redis deployment that is not appropriate for production deployments. For production deployments, W&B can connect to a Redis instance in the following environments:- AWS Elasticache
- Google Cloud Memory Store
- Azure Cache for Redis
- Redis deployment hosted in your cloud or on-premise infrastructure
Object storage
W&B requires object storage with pre-signed URL and CORS support. Recommended storage providers:- Amazon S3: Object storage service offering industry-leading scalability, data availability, security, and performance.
- Google Cloud Storage: Managed service for storing unstructured data at scale.
- Azure Blob Storage: Cloud-based object storage solution for storing massive amounts of unstructured data.
- CoreWeave AI Object Storage: High-performance, S3-compatible object storage service optimized for AI workloads.
- Enterprise S3-compatible storage: MinIO Enterprise (AIStor), NetApp StorageGRID, or other enterprise-grade solutions
MinIO Open Source is in maintenance mode with no active development or pre-compiled binaries. For production deployments, W&B recommends using managed object storage services or enterprise S3-compatible solutions such as MinIO Enterprise (AIStor).
Provision your storage bucket
Before configuring W&B, provision your object storage bucket with proper IAM policies, CORS configuration, and access credentials. See the Bring Your Own Bucket (BYOB) guide for detailed step-by-step provisioning instructions for:- Amazon S3 (including IAM policies and bucket policies)
- Google Cloud Storage (including PubSub notifications)
- Azure Blob Storage (including managed identities)
- CoreWeave AI Object Storage
- S3-compatible storage (MinIO Enterprise, NetApp StorageGRID, and other enterprise solutions)
OpenShift Kubernetes clusters
W&B supports deployment on OpenShift Kubernetes clusters in cloud, on-premises, and air-gapped environments.W&B recommends you install with the official W&B Helm chart.
Run the container as an un-privileged user
By default, containers use a$UID of 999. Specify $UID >= 100000 and a $GID of 0 if your orchestrator requires the container run with a non-root user.
W&B must start as the root group (
$GID=0) for file system permissions to function properly.app or console. For details, see Custom security context.
Deploy W&B Server application
The W&B Kubernetes Operator with Helm is the recommended installation method for all W&B Self-Managed deployments, including cloud, on-premises, and air-gapped environments.
- Helm CLI
- Terraform
W&B provides a Helm Chart to deploy the W&B Kubernetes operator to a Kubernetes cluster. This approach allows you to deploy W&B Server with Helm CLI or a continuous delivery tool like ArgoCD.For deployment-specific considerations, see Environment-specific considerations and Deploy with Terraform on public cloud. For disconnected environments, see Deploy on Air-Gapped Kubernetes.Follow these steps to install the W&B Kubernetes Operator with Helm CLI:
-
Add the W&B Helm repository. The W&B Helm chart is available in the W&B Helm repository:
-
Install the Operator on a Kubernetes cluster:
-
Configure the W&B operator custom resource to trigger the W&B Server installation. Create a file named
operator.yamlwith your W&B deployment configuration. Refer to Configuration Reference for all available options. Here’s a minimal example configuration: -
Start the Operator with your custom configuration so that it can install, configure, and manage the W&B Server application:
Wait until the deployment completes. This takes a few minutes.
- To verify the installation using the web UI, create the first admin user account, then follow the verification steps outlined in Verify the installation.
Verify the installation
To verify the installation, W&B recommends using the W&B CLI. The verify command executes several tests that verify all components and configurations.This step assumes that the first admin user account is created with the browser.
- Install the W&B CLI:
- Log in to W&B:
- Verify the installation:
Enable the MCP server
The W&B Model Context Protocol (MCP) server ships as an optional subchart inoperator-wandb. When enabled, the operator deploys an in-cluster MCP server exposed through your existing ingress at <global.host>/mcp, so any MCP-compatible client (Cursor, VS Code, Claude Code, Gemini CLI, Claude Desktop, and others) can connect using a W&B API key. This is the same server W&B runs as the hosted offering at https://mcp.withwandb.com/mcp, just pointed at your deployment’s data.
For end-user client configuration and the tool catalog, see Use the W&B MCP server. This section only covers the operator-side enablement.
Prerequisites
Make sure your deployment meets the following requirements before you enable the MCP server:- Chart version:
operator-wandb0.42.3or later. Themcp-serversubchart was introduced in0.42.1. The example below uses Datadog and privacy settings added after that initial release. - Weave Traces enabled: the MCP server depends on Weave Traces for trace tools and for the
WF_TRACE_SERVER_URLdefault. Setweave-trace.install: true. If Weave Traces isn’t enabled, the Helm render fails withmcp-server requires weave-trace.install=true. - Reachable ingress:
global.hostmust already resolve and route to the W&B ingress. The MCP pod readsWANDB_BASE_URLfromglobal.hostand is available at<global.host>/mcp. - Node capacity: the MCP pod requests
500mCPU and1Gimemory by default (limits2CPU and4Gimemory). Confirm your node pool has enough headroom before you enable the subchart.
Enable the subchart
Add the following to thespec.values block of your existing WeightsAndBiases custom resource (CR), alongside your existing global, ingress, and other overrides. The Datadog block is optional, but recommended when a Datadog Agent DaemonSet already collects pod logs and traces in your cluster.
weave-trace.install: true unless you set mcp-server.env.WF_TRACE_SERVER_URL yourself. Use datadog.mode: "agent" for Kubernetes deployments where the Datadog Agent DaemonSet owns log and trace collection. In agent mode, the MCP pod doesn’t need a Datadog API key. Set service, env, deploymentType, customer, and extraTags to match your deployment’s observability naming conventions. Set customer to an empty string if you don’t want a customer tag.
Use privacy.logLevel: "standard" for most self-managed Kubernetes installations. This redacts free-text parameter values in logs while preserving deployment identifiers that operators commonly use for debugging. Use "strict" when entity, project, run, or user identifiers should not remain in plaintext logs. Use "off" only when you explicitly want plaintext logging for those values.
Apply the change to trigger a reconcile:
wandb-mcp-server deployment and service in the release namespace, and extends the W&B ingress with a /mcp path.
Verify the MCP server
Wait for the pod to becomeRunning, then check the health endpoint in-cluster and through the ingress:
200 OK. The in-cluster check confirms the pod is healthy. The ingress check confirms routing. If you enabled Datadog, MCP server logs should also appear under the configured mcp-server.datadog.service and mcp-server.datadog.env values.
Connect a client
After the MCP server is running and healthy, point your MCP client athttps://<HOST_URI>/mcp with a W&B API key as the bearer token. For IDE and agent configurations (Cursor, VS Code, Claude Code, and others), see Use the W&B MCP server.
Troubleshooting
| Symptom | Likely cause and fix |
|---|---|
helm render fails with mcp-server requires weave-trace.install=true | Add weave-trace.install: true to spec.values. The MCP server depends on Weave Traces for trace tools. |
wandb-mcp-server pod stuck in Pending with Insufficient cpu or Insufficient memory | Add node capacity, or lower mcp-server.resources.requests in your CR. Defaults are 500m CPU and 1Gi memory. |
curl https://<HOST_URI>/mcp/health returns 404 | The /mcp ingress path is only rendered when mcp-server.install: true. Reapply the CR and wait for the ingress controller to propagate the new path. |
| MCP logs don’t appear in Datadog | Confirm mcp-server.datadog.enabled: true, mcp-server.datadog.mode: "agent", and that the Datadog Agent DaemonSet collects pod stdout. Search Datadog with the configured service and env values. |
| MCP logs include more user-supplied text than expected | Set mcp-server.privacy.logLevel to "standard" or "strict". Use "strict" when identifiers such as entity, project, run, or user names should not remain in plaintext logs. |
wandb-mcp-server pod in ImagePullBackOff in an air-gapped or mirrored cluster | Mirror the image to your registry and override mcp-server.image.repository in your CR, the same pattern used for other W&B component images in air-gapped installs. See Deploy on Air-Gapped Kubernetes. |
Environment-specific considerations
Kubernetes is the same whether it runs on-premises or in the cloud. The main differences are in naming and managed services (for example, MySQL vs RDS, or S3 vs on-premises object storage). This section covers considerations that vary by environment.On-premises and bare metal
When deploying on on-premises or bare-metal Kubernetes, pay attention to the following.Load balancer configuration
On-premises Kubernetes clusters typically require manual load balancer configuration. Options include:- External load balancer: Configure an existing hardware or software load balancer (F5, HAProxy, etc.)
- Nginx Ingress Controller: Deploy nginx-ingress-controller with NodePort or host networking
- MetalLB: For bare-metal Kubernetes clusters, MetalLB provides load balancer services
Persistent storage
Ensure your Kubernetes cluster has a StorageClass configured for persistent volumes. W&B components may require persistent storage for caching and temporary data. Common on-premises storage options:- NFS-based storage classes
- Ceph/Rook storage
- Local persistent volumes
- Enterprise storage solutions (NetApp, Pure Storage, etc.)
DNS and certificate management
For on-premises deployments:- Configure internal DNS records to point to your W&B hostname
- Provision SSL/TLS certificates from your internal Certificate Authority (CA)
- If using self-signed certificates, configure the operator to trust your CA certificate
OpenShift deployments
W&B fully supports deployment on OpenShift Kubernetes clusters. OpenShift deployments require additional security context configurations due to OpenShift’s stricter security policies. For OpenShift-specific configuration details, see OpenShift Kubernetes clusters above. For comprehensive OpenShift examples in air-gapped environments, see Deploy on Air-Gapped Kubernetes.Object storage for on-premises and S3-compatible
After provisioning your object storage bucket (see Object storage provisioning), configure it in your W&B Custom Resource. AWS S3 (on-premises) For on-premises AWS S3 (via Outposts or compatible storage):?tls=true to the bucket path:
- Storage capacity and performance: Monitor disk capacity carefully. Average W&B usage results in tens to hundreds of gigabytes. Heavy usage could result in petabytes of storage consumption.
- Fault tolerance: At minimum, use RAID arrays for physical disks. For S3-compatible storage, use distributed or highly available configurations.
- Availability: Configure monitoring to ensure the storage remains available.
- Amazon S3 on Outposts
- NetApp StorageGRID
- MinIO Enterprise (AIStor)
- Dell ObjectScale
Public cloud with Terraform
For full infrastructure-plus-application deployment on AWS, Google Cloud, or Azure, see Deploy with Terraform on public cloud below.Deploy with Terraform on public cloud
W&B recommends fully managed deployment options such as W&B Multi-tenant Cloud or W&B Dedicated Cloud deployment types. W&B fully managed services are simple and secure to use, with minimum to no configuration required.
- AWS
- Google Cloud
- Azure
W&B recommends using the W&B Server AWS Terraform Module to deploy the platform on AWS.The Terraform Module deploys the following mandatory components:
- Load Balancer
- AWS Identity & Access Management (IAM)
- AWS Key Management System (KMS)
- Amazon Aurora MySQL
- Amazon VPC
- Amazon S3
- Amazon Route53
- Amazon Certificate Manager (ACM)
- Amazon Elastic Load Balancing (ALB)
- Amazon Secrets Manager
- Elastic Cache for Redis
- SQS
Pre-requisite permissions
The account that runs Terraform needs to be able to create all components described above and permission to create IAM Policies and IAM Roles and assign roles to resources.General steps
The steps in this section are common for any deployment option.-
Prepare the development environment.
- Install Terraform
- W&B recommend creating a Git repository for version control.
-
Create the
terraform.tfvarsfile. Thetvfarsfile content can be customized according to the installation type, but the minimum recommended will look like the example below.Ensure to define variables in yourtvfarsfile before you deploy because thenamespacevariable is a string that prefixes all resources created by Terraform. The combination ofsubdomainanddomainforms the FQDN for your W&B instance. In the example above, the W&B FQDN will bewandb-aws.wandb.mland the DNSzone_idwhere the FQDN record will be created. Bothallowed_inbound_cidrandallowed_inbound_ipv6_cidralso require setting. In the module, this is a mandatory input. The proceeding example permits access from any source to the W&B installation. -
Create the file
versions.tfThis file will contain the Terraform and Terraform provider versions required to deploy W&B in AWS:Refer to the Terraform Official Documentation to configure the AWS provider. Optionally, but highly recommended, add the remote backend configuration mentioned at the beginning of this documentation. -
Create the file
variables.tfFor every option configured in theterraform.tfvarsTerraform requires a correspondent variable declaration.
Recommended deployment
This is the most straightforward deployment option configuration that creates all mandatory components and installs in the Kubernetes Cluster the latest version of W&B.-
Create the
main.tfIn the same directory where you created the files in the General Steps, create a filemain.tfwith the following content: -
Deploy W&B
To deploy W&B, execute the following commands:
Enable Redis
To use Redis to cache SQL queries and speed up the application response when loading metrics, add the optioncreate_elasticache_subnet = true to the main.tf file:Enable message broker (queue)
To enable an external message broker using SQS, add the optionuse_internal_queue = false to the main.tf file:This is optional because W&B includes an embedded broker. This option does not bring a performance improvement.
Additional resources
Other deployment options
You can combine multiple deployment options by adding all configurations to the same file. Each Terraform module provides several options that can be combined with the standard options and the minimal configuration found in the recommended deployment section. Refer to the module documentation for your cloud provider for the full list of available options:Access the W&B Management Console
The W&B Kubernetes operator comes with a management console. It is located at${HOST_URI}/console, for example https://wandb.company-name.com/console.
There are two ways to log in to the management console:
- Option 1 (Recommended)
- Option 2
-
Open the W&B application in the browser and log in. Log in to the W&B application with
${HOST_URI}/, for examplehttps://wandb.company-name.com/ -
Access the console. Click on the icon in the top right corner and then click System console. Only users with admin privileges can see the System console entry.

Update the W&B Kubernetes operator
This section describes how to update the W&B Kubernetes operator.- Updating the W&B Kubernetes operator does not update the W&B server application.
- See the instructions here if you use a Helm chart that does not user the W&B Kubernetes operator before you follow the proceeding instructions to update the W&B operator.
-
First, update the repo with
helm repo update: -
Next, update the Helm chart with
helm upgrade:
Update the W&B Server application
You no longer need to update W&B Server application if you use the W&B Kubernetes operator. The operator automatically updates your W&B Server application when a new version of the software of W&B is released.Migrate Self-Managed instances to W&B Operator
The proceeding section describe how to migrate from self-managing your own W&B Server installation to using the W&B Operator to do this for you. The migration process depends on how you installed W&B Server:The W&B Operator is the default and recommended installation method for W&B Server. Reach out to Customer Support or your W&B team if you have any questions.
- If you used the official W&B Cloud Terraform Modules, navigate to the appropriate documentation and follow the steps there:
- If you used the W&B Non-Operator Helm chart, continue here.
- If you used the W&B Non-Operator Helm chart with Terraform, continue here.
- If you created the Kubernetes resources with manifests, continue here.
Migrate to Operator-based AWS Terraform Modules
For a detailed description of the migration process, continue here.Migrate to Operator-based Google Cloud Terraform Modules
Reach out to Customer Support or your W&B team if you have any questions or need assistance.Migrate to Operator-based Azure Terraform Modules
Reach out to Customer Support or your W&B team if you have any questions or need assistance.Migrate to Operator-based Helm chart
Follow these steps to migrate to the Operator-based Helm chart:-
Get the current W&B configuration. If W&B was deployed with an non-operator-based version of the Helm chart, export the values like this:
If W&B was deployed with Kubernetes manifests, export the values like this:You now have all the configuration values you need for the next step.
-
Create a file called
operator.yaml. Follow the format described in the Configuration Reference. Use the values from step 1. -
Scale the current deployment to 0 pods. This step is stops the current deployment.
-
Update the Helm chart repo:
-
Install the new Helm chart:
-
Configure the new helm chart and trigger W&B application deployment. Apply the new configuration.
The deployment takes a few minutes to complete.
- Verify the installation. Make sure that everything works by following the steps in Verify the installation.
- Remove to old installation. Uninstall the old helm chart or delete the resources that were created with manifests.
Migrate to Operator-based Terraform Helm chart
Follow these steps to migrate to the Operator-based Helm chart:- Prepare Terraform config. Replace the Terraform code from the old deployment in your Terraform config with the one that is described here. Set the same variables as before. Do not change .tfvars file if you have one.
- Execute Terraform run. Execute terraform init, plan and apply
- Verify the installation. Make sure that everything works by following the steps in Verify the installation.
- Remove to old installation. Uninstall the old helm chart or delete the resources that were created with manifests.
Configuration Reference for W&B Server
This section describes the configuration options for W&B Server application. The application receives its configuration as custom resource definition named WeightsAndBiases. Some configuration options are exposed with the below configuration, some need to be set as environment variables. The documentation has two lists of environment variables: basic and advanced. Only use environment variables if the configuration option that you need is not exposed using the Helm Chart.Basic example
This example defines the minimum set of values required for W&B. For a more realistic production example, see Complete example. This YAML file defines the desired state of your W&B deployment, including the version, environment variables, external resources like databases, and other necessary settings.Complete example
This example configuration deploys W&B to Google Cloud Anthos using Google Cloud Storage:Host
Object storage (bucket)
AWSkmsKey must be null.
To reference accessKey and secretKey from a secret:
MySQL
password from a secret:
License
license from a secret:
Ingress
To identify the ingress class, see this FAQ entry. Without TLSCustom Kubernetes ServiceAccounts
Specify custom Kubernetes service accounts to run the W&B pods. The following snippet creates a service account as part of the deployment with the specified name:create: false:
External Redis
password from a secret:
LDAP
Configure LDAP by setting environment variables inglobal.extraEnv:
OIDC SSO
authMethod is optional.
SMTP
Environment variables
Custom certificate authority
customCACerts is a list and can take many certificates. Certificate authorities specified in customCACerts only apply to the W&B Server application.
If using a ConfigMap, each key in the ConfigMap must end with
.crt (for example, my-cert.crt or ca-cert1.crt). This naming convention is required for update-ca-certificates to parse and add each certificate to the system CA store.Custom security context
Each W&B component supports custom security context configurations of the following form:The only valid value for
runAsGroup: is 0. Any other value is an error.app to your configuration:
console, weave, weave-trace and parquet.
Configuration reference for W&B Operator
This section describes configuration options for W&B Kubernetes operator (wandb-controller-manager). The operator receives its configuration in the form of a YAML file.
By default, the W&B Kubernetes operator does not need a configuration file. Create a configuration file if required. For example, you might need a configuration file to specify custom certificate authorities, deploy in an air gap environment and so forth.
Find the full list of spec customization in the Helm repository.
Custom CA
A custom certificate authority (customCACerts), is a list and can take many certificates. Those certificate authorities when added only apply to the W&B Kubernetes operator (wandb-controller-manager).
Each key in the ConfigMap must end with
.crt (for example, my-cert.crt or ca-cert1.crt). This naming convention is required for update-ca-certificates to parse and add each certificate to the system CA store.FAQ
What is the purpose/role of each individual pod?
wandb-app: the core of W&B, including the GraphQL API and frontend application. It powers most of our platform’s functionality.wandb-console: the administration console, accessed via/console.wandb-otel: the OpenTelemetry agent, which collects metrics and logs from resources at the Kubernetes layer for display in the administration console.wandb-prometheus: the Prometheus server, which captures metrics from various components for display in the administration console.wandb-parquet: a backend microservice separate from thewandb-apppod that exports database data to object storage in Parquet format.wandb-weave: another backend microservice that loads query tables in the UI and supports various core app features.wandb-weave-trace: a framework for tracking, experimenting with, evaluating, deploying, and improving LLM-based applications. The framework is accessed via thewandb-apppod.
How to get the W&B Operator Console password
See Accessing the W&B Kubernetes Operator Management Console.How to access the W&B Operator Console if Ingress doesn’t work
Execute the following command on a host that can reach the Kubernetes cluster:https://localhost:8082/ console.
See Accessing the W&B Kubernetes Operator Management Console on how to get the password (Option 2).
