Boards

Feature Requests

Developer Experience

Documentation

Integrations

CLI

Public API

Powered by Canny

Feature Requests

Memory/GPU Checkpointing

This will allow both CPU and GPU applications to start faster

App environments

We would like to deploy our applications across multiple environments (e.g staging, production and canary). These are all separate apps at the moment (Each requiring their own configuration). It should be easy to promote applications across these environments

Enable GPU-enabled builds

GPU access during build steps (shell_commands / Dockerfile) Models that use torch.compile, Triton JIT, or CUDA graph capture need a GPU warmup pass to pre-compile kernels. Currently builds run on CPU-only machines, so this compilation has to happen at runtime on every cold start, adding 20-60s of latency before the first request can be served. With GPU-enabled builds, users could run a warmup inference during shell_commands and bake the compiled artifacts (Inductor cache, Triton kernels, CUDA graphs) into the container image. Cold starts would then skip compilation entirely.

Please provide a manual payment option, or upi payment option if the automatic payment failed

Allow CORS preflight (OPTIONS) requests to pass through to application

Problem: The Envoy gateway intercepts CORS preflight (OPTIONS) requests and returns wildcard headers ( Access-Control-Allow-Origin: * ) before they reach our FastAPI application. This breaks credentialed requests because browsers reject Access-Control-Allow-Origin: * when credentials: 'include' is used. This is per the CORS specification. Current behavior: - Browser sends OPTIONS preflight with Access-Control-Request-Headers - Gateway intercepts and returns Access-Control-Allow-Origin: * - Browser blocks the actual request because wildcards can't be used with credentials Expected behavior: - OPTIONS requests should pass through to the application - Application returns specific origin: Access-Control-Allow-Origin: https://mydomain.com - Browser allows the credentialed request Use case: We have a SaaS application with a custom domain. Our frontend (on a different domain) makes authenticated API calls with cookies/credentials. Without proper CORS handling, our custom domain is unusable for production. Suggested solutions (any would work): 1. Option to disable gateway-level CORS handling entirely 2. Option to pass-through OPTIONS requests to the application 3. Configuration to specify allowed origins at the gateway level This is blocking our production deployment on a custom domain.

Container management like docker through cli

Many a time Ctrl+C sigint does not terminate a container. We must be able to manually shutdown container by id. Basically expose main docker commands like docker ps docker stop id docker kill id and change them to cerebrium ps, cerebrium stop and cerebrium kill. This must be added to cli.

India region availability

Now that Cerebrium has gone multi-region, we'd like to improve the latency and compliance of our application by running in our region (Mumbai)

Scheduled application updates

We often have to change our scaling configurations based on peak and off peak hours. We've been able to do this manually by calling your REST API. We'd like to be able to scale our apps up and down based on a chron schedule or similar,

Support direct uploads from S3 to project storage

I'd like to upload files from their S3 buckets directly to /persistent-storage . Example use case: triggering deployments from a web server, which doesn't have the model files locally. I'd like to avoid having to download data from S3 only to upload to /persistent-storage . Something like cerebrium cp s3://... model/ would be ideal, accepting presigned URLs.

Advanced billing alerts

Currently we are able to set an alert that gets sent to our email address when a threshold has been hit. This is limited to once per month. We would like to be able to configure these alerts: Set them as hourly, daily, weekly or monthly thresholds Be able to send these alerts to a communication tool of our choosing (Slack, email, or a webhook) We would also like to the option to terminate all running applications/containers within that space/project when thresholds are hit so that we don't overspend.

Load More

→

Powered by Canny