Skip to main content

What is Application Hub?

Application Hub is the central component of Cordatus for deploying, configuring, and managing AI applications and containerized workloads at scale.
It provides a unified interface to launch LLM inference engines, NVIDIA AI frameworks, and custom Docker containers across your devices with minimal configuration.

Application Hub transforms complex Docker deployments into simple, guided workflows — allowing you to run production-grade AI models without dealing with command-line complexity or infrastructure details.

Who is it for?

  • AI Engineers & Data Scientists who need to deploy and test LLM models across different hardware configurations
  • DevOps Teams managing AI infrastructure and container orchestration at scale
  • ML Operations Engineers who need to optimize GPU utilization and model performance
  • Organizations running distributed AI workloads across multiple devices and locations
  • Researchers & Developers experimenting with different models, quantizations, and inference engines

What can I do with Application Hub?

With Application Hub you can:

  • Deploy AI Applications
    Launch pre-configured applications such as:

    • vLLM, TensorRT-LLM, Ollama (LLM inference engines)
    • NVIDIA AI Dynamo (distributed LLM runtime)
    • NVIDIA VSS (Video Search & Summarization)
    • Custom Docker containers
  • Configure Advanced Settings

    • GPU Selection: Choose specific GPUs or allocate all available GPUs
    • Resource Limits: Set CPU core and RAM limits with Host Reserved protection
    • Model Selection: Use Cordatus Models, Custom Models, or User Models
    • Docker Options: Configure ports, volumes, networks, and environment variables
    • Engine Arguments: Fine-tune inference parameters (batch size, quantization, etc.)
  • Calculate VRAM Requirements Use the VRAM Calculator to:

    • Predict GPU memory requirements before deployment
    • Test different configurations (quantization, sequence length, batch size)
    • Determine optimal hardware for your models
    • Plan multi-GPU deployments
  • Manage Models Across Devices
    With User Models:

    • Add models from your devices to Cordatus
    • Transfer models between devices on the same network
    • Use custom models not available on the internet
    • Automatically configure volume mappings for different inference engines
  • Monitor & Control Containers

    • View real-time container status and logs
    • Start, stop, or delete containers individually or in groups
    • Generate public URLs for deployed applications
    • Create Open Web UI interfaces for LLM models
    • Duplicate existing containers with modified configurations

Key concepts

  • Application — A pre-configured Docker image registered in Cordatus (e.g., vLLM, TensorRT-LLM, NVIDIA Dynamo, NVIDIA VSS).
    See details → Application Launch Guide | Standard Applications | NVIDIA VSS Guide | NVIDIA Dynamo Guide

  • Container — A running instance of an application with specific configuration (GPU assignment, model, parameters).
    See details → Container Management Guide

  • Container Group — Multiple containers deployed together as a single unit (e.g., VSS + VLM + LLM + Embed + Rerank).
    See details → Container Management Guide | NVIDIA VSS Guide

  • Model Types

    • Cordatus Models: Pre-tested models registered in the Cordatus system
    • Custom Models: Models specified by name or URL (downloaded during deployment)
    • User Models: Models you've added from your devices
  • Inference Engine — The runtime framework that executes LLM models:

    • vLLM: High-throughput inference with PagedAttention
    • TensorRT-LLM: NVIDIA's optimized inference engine
    • Ollama: Simple, local model deployment
    • NVIDIA NIM: Enterprise-grade microservices
  • Quantization — Model weight compression format:

    • BF16/FP16: Full precision (16-bit)
    • INT8/FP8: Half memory usage (8-bit)
    • INT4/FP4: Quarter memory usage (4-bit)
  • Resource Limits

    • CPU Core Limit: Maximum CPU cores the container can use
    • RAM Limit: Maximum memory allocation
    • Host Reserved: Resources automatically reserved for system stability
    • No Limit: Override Host Reserved (use with caution)
  • VRAM Components

    • Model Weights: Memory occupied by model parameters
    • KV Cache: Key-Value cache for transformer models
    • Overhead/Activation: System overhead or activation memory
    • Free VRAM: Remaining available memory
      See details → VRAM Calculator User Guide

How does Application Hub work?

  1. Select an Application

    • Browse available applications from Containers > Applications
    • View application details, versions, and supported platforms
    • Check which Docker images are already downloaded on your device
      See details → Application Launch Guide
  2. Choose Device & Version

    • Select the target device for deployment
    • Choose the Docker image version
    • Cordatus shows whether the image needs to be downloaded
  3. Configure Advanced Settings

    General Settings:

    • Assign environment name
    • Select GPUs (or use "All GPU")
    • Set CPU core and RAM limits
    • Enable Open Web UI (for LLM applications)

    Model Selection (LLM Applications):

    • Choose from Cordatus Models, Custom Models, or User Models
    • Cordatus handles model transfer if needed
    • Automatic volume configuration based on inference engine

    Docker Options:

    • Configure port mappings (auto-assigned or manual)
    • Set up volume bindings with visual file explorer
    • Define network settings and restart policies

    Environment Variables:

    • Use pre-defined variables for the application
    • Add custom variables or select tokens from your account

    Engine Arguments:

    • Configure inference parameters (batch size, quantization, etc.)
    • For NVIDIA Dynamo: Configure processing mode, router, connector, workers
    • For NVIDIA VSS: Configure VLM, LLM, Embed, and Rerank components
  4. Launch & Monitor

    • Review configuration and click Start Environment
    • Enter sudo password for authorization
    • Monitor deployment progress and container status
    • Access containers via Containers page or Applications > Containers tab
      See details → Container Management Guide
  5. Manage Running Containers

    • View logs and parameters in real-time
    • Generate public URLs for external access
    • Start, stop, or delete containers
    • Create Open Web UI for LLM models
    • Duplicate containers with modified settings

Application Types

Standard Applications

Simple Docker containers with basic GPU, CPU, and RAM configuration:

  • Single container deployment
  • Direct GPU assignment
  • Standard Docker options

See details → Standard Application Creation Guide

LLM Engine Applications

Advanced inference engines with model management:

  • Model selection (Cordatus, Custom, User Models)
  • Automatic volume configuration
  • Model transfer between devices
  • Optional Open Web UI creation
  • Engine-specific arguments

See details → Standard Application Creation Guide | User Models Guide

NVIDIA AI Dynamo

Distributed LLM runtime for multi-GPU deployments:

  • Processing modes (Aggregated/Disaggregated)
  • Router configuration (KV-Aware, Round Robin, Random)
  • Connector setup (KVBM, NIXL, LM Cache)
  • Worker creation and GPU assignment per worker
  • Multi-container orchestration

See details → NVIDIA AI Dynamo Creation Guide

NVIDIA VSS (Video Search & Summarization)

Complex multi-component pipeline:

  • Main VSS container with Event Reviewer option
  • VLM (Vision Language Model) component
  • LLM (Large Language Model) component
  • Embed (Embedding Model) component
  • Rerank (Rerank Model) component
  • Each component can be: new, existing, or remote application

See details → NVIDIA VSS Creation Guide


User Models & Model Transfer

Model Path Configuration

Define model paths on your devices for:

  • Huggingface models
  • Ollama models
  • NVIDIA NIM models

Adding Models

Two methods to add models:

  1. Start Scanning: Automatic detection of models in defined paths
  2. Add Manually: Select specific model directories

Model Transfer

  • Transfer models between devices on the same network
  • Resume interrupted transfers automatically
  • Track transfer progress in real-time
  • Automatic volume configuration after transfer

Model Deployment

  • Click Deploy next to any User Model
  • Select inference engine (vLLM, TensorRT-LLM, etc.)
  • System redirects to Application Launch interface
  • Model is automatically configured with correct paths

Learn more: User Models and Model Transfer Guide


VRAM Calculator

Calculate Before You Deploy

Learn more: VRAM Calculator User Guide Avoid deployment failures by calculating VRAM requirements first:

  1. Select Model: Choose from registered models or search Hugging Face

  2. Choose GPU: Select GPU model or use device's actual GPUs

  3. Configure Parameters:

    • Quantization (BF16, FP16, INT8, INT4)
    • Sequence Length (1K - 256K tokens)
    • Batch Size (1 - 512+)
    • GPU Count (Standalone mode)
    • GPU Memory Utilization (0% - 100%)
    • Calculation Type (Overhead vs Activation)
  4. View Results:

    • Visual doughnut chart showing memory distribution
    • Detailed metrics for each component
    • Sufficient/Insufficient status indicator
    • Usage percentage bar

Use Cases

  • Test hardware requirements before purchasing GPUs
  • Optimize quantization and batch size for existing hardware
  • Plan multi-GPU deployments
  • Compare different model configurations

Container Management

Container Operations

  • Start: Launch stopped containers individually or in groups
  • Stop: Stop running containers individually or in groups
  • Delete: Remove containers (type "DELETE" to confirm)
  • Duplicate: Create new container with same configuration

Batch Operations

  • Select multiple containers using checkboxes
  • Delete multiple containers simultaneously
  • Apply operations to entire container groups

Container Information

View detailed information for any container:

  • Logs: Real-time log output
  • Parameters: All configuration settings
  • Ports: Local and public URLs

Public URL Generation

  • Generate publicly accessible URLs for any container port
  • Refresh, deactivate, or reassign ports
  • Share access to deployed applications

Open Web UI Creation

For LLM engines:

  • One-click Open Web UI deployment
  • Optional public URL generation
  • Direct chat interface with your model

Best Practices

Resource Management

  • Always maintain Host Reserved values for system stability
  • Leave 20-30% Free VRAM for unexpected loads
  • Use VRAM Calculator before production deployments
  • Set appropriate CPU and RAM limits based on workload

Model Management

  • Organize models in consistent directory structures
  • Define model paths on all devices before using User Models
  • Use meaningful names for custom models
  • Keep model metadata (quantization, parameters) up to date

Container Deployment

  • Test configurations with small models first
  • Use Activation calculation mode for accurate VRAM estimates
  • Monitor container logs during initial deployment
  • Create container groups for multi-component applications

Multi-GPU Deployments

  • Use NVIDIA Dynamo for distributed inference
  • Configure appropriate worker counts and GPU assignments
  • Choose router strategy based on workload (KV-Aware for efficiency)
  • Monitor GPU utilization across all workers

Getting Help

For detailed step-by-step instructions with screenshots and videos:

  • Application Launch Guide: Standard application deployment
  • NVIDIA AI Dynamo Guide: Distributed LLM runtime setup
  • NVIDIA VSS Guide: Video analysis pipeline deployment
  • User Models Guide: Model management and transfer
  • Container Management Guide: Container operations and monitoring
  • VRAM Calculator Guide: Memory requirement calculation