What is Application Hub?

Application Hub is the central component of Cordatus for deploying, configuring, and managing AI applications and containerized workloads at scale.
It provides a unified interface to launch LLM inference engines, NVIDIA AI frameworks, and custom Docker containers across your devices with minimal configuration.

Application Hub transforms complex Docker deployments into simple, guided workflows — allowing you to run production-grade AI models without dealing with command-line complexity or infrastructure details.

Who is it for?

AI Engineers & Data Scientists who need to deploy and test LLM models across different hardware configurations
DevOps Teams managing AI infrastructure and container orchestration at scale
ML Operations Engineers who need to optimize GPU utilization and model performance
Organizations running distributed AI workloads across multiple devices and locations
Researchers & Developers experimenting with different models, quantizations, and inference engines

What can I do with Application Hub?

With Application Hub you can:

Deploy AI Applications
Launch pre-configured applications such as:
- vLLM, TensorRT-LLM, Ollama (LLM inference engines)
- NVIDIA AI Dynamo (distributed LLM runtime)
- NVIDIA VSS (Video Search & Summarization)
- Custom Docker containers
Configure Advanced Settings
- GPU Selection: Choose specific GPUs or allocate all available GPUs
- Resource Limits: Set CPU core and RAM limits with Host Reserved protection
- Model Selection: Use Cordatus Models, Custom Models, or User Models
- Docker Options: Configure ports, volumes, networks, and environment variables
- Engine Arguments: Fine-tune inference parameters (batch size, quantization, etc.)
Calculate VRAM Requirements Use the VRAM Calculator to:
- Predict GPU memory requirements before deployment
- Test different configurations (quantization, sequence length, batch size)
- Determine optimal hardware for your models
- Plan multi-GPU deployments
Manage Models Across Devices
With User Models:
- Add models from your devices to Cordatus
- Transfer models between devices on the same network
- Use custom models not available on the internet
- Automatically configure volume mappings for different inference engines
Monitor & Control Containers
- View real-time container status and logs
- Start, stop, or delete containers individually or in groups
- Generate public URLs for deployed applications
- Create Open Web UI interfaces for LLM models
- Duplicate existing containers with modified configurations

Key concepts

Application — A pre-configured Docker image registered in Cordatus (e.g., vLLM, TensorRT-LLM, NVIDIA Dynamo, NVIDIA VSS).
See details → Application Launch Guide | Standard Applications | NVIDIA VSS Guide | NVIDIA Dynamo Guide
Container — A running instance of an application with specific configuration (GPU assignment, model, parameters).
See details → Container Management Guide
Container Group — Multiple containers deployed together as a single unit (e.g., VSS + VLM + LLM + Embed + Rerank).
See details → Container Management Guide | NVIDIA VSS Guide
Model Types
- Cordatus Models: Pre-tested models registered in the Cordatus system
- Custom Models: Models specified by name or URL (downloaded during deployment)
- User Models: Models you've added from your devices
Inference Engine — The runtime framework that executes LLM models:
- vLLM: High-throughput inference with PagedAttention
- TensorRT-LLM: NVIDIA's optimized inference engine
- Ollama: Simple, local model deployment
- NVIDIA NIM: Enterprise-grade microservices
Quantization — Model weight compression format:
- BF16/FP16: Full precision (16-bit)
- INT8/FP8: Half memory usage (8-bit)
- INT4/FP4: Quarter memory usage (4-bit)
Resource Limits
- CPU Core Limit: Maximum CPU cores the container can use
- RAM Limit: Maximum memory allocation
- Host Reserved: Resources automatically reserved for system stability
- No Limit: Override Host Reserved (use with caution)
VRAM Components
- Model Weights: Memory occupied by model parameters
- KV Cache: Key-Value cache for transformer models
- Overhead/Activation: System overhead or activation memory
- Free VRAM: Remaining available memory
  See details → VRAM Calculator User Guide

How does Application Hub work?

Select an Application
- Browse available applications from Containers > Applications
- View application details, versions, and supported platforms
- Check which Docker images are already downloaded on your device
  See details → Application Launch Guide
Choose Device & Version
- Select the target device for deployment
- Choose the Docker image version
- Cordatus shows whether the image needs to be downloaded
Configure Advanced Settings

General Settings:
- Assign environment name
- Select GPUs (or use "All GPU")
- Set CPU core and RAM limits
- Enable Open Web UI (for LLM applications)
Model Selection (LLM Applications):
- Choose from Cordatus Models, Custom Models, or User Models
- Cordatus handles model transfer if needed
- Automatic volume configuration based on inference engine
Docker Options:
- Configure port mappings (auto-assigned or manual)
- Set up volume bindings with visual file explorer
- Define network settings and restart policies
Environment Variables:
- Use pre-defined variables for the application
- Add custom variables or select tokens from your account
Engine Arguments:
- Configure inference parameters (batch size, quantization, etc.)
- For NVIDIA Dynamo: Configure processing mode, router, connector, workers
- For NVIDIA VSS: Configure VLM, LLM, Embed, and Rerank components
Launch & Monitor
- Review configuration and click Start Environment
- Enter sudo password for authorization
- Monitor deployment progress and container status
- Access containers via Containers page or Applications > Containers tab
  See details → Container Management Guide
Manage Running Containers
- View logs and parameters in real-time
- Generate public URLs for external access
- Start, stop, or delete containers
- Create Open Web UI for LLM models
- Duplicate containers with modified settings

Application Types

Standard Applications

Simple Docker containers with basic GPU, CPU, and RAM configuration:

Single container deployment
Direct GPU assignment
Standard Docker options

See details → Standard Application Creation Guide

LLM Engine Applications

Advanced inference engines with model management:

Model selection (Cordatus, Custom, User Models)
Automatic volume configuration
Model transfer between devices
Optional Open Web UI creation
Engine-specific arguments

See details → Standard Application Creation Guide | User Models Guide

NVIDIA AI Dynamo

Distributed LLM runtime for multi-GPU deployments:

Processing modes (Aggregated/Disaggregated)
Router configuration (KV-Aware, Round Robin, Random)
Connector setup (KVBM, NIXL, LM Cache)
Worker creation and GPU assignment per worker
Multi-container orchestration

See details → NVIDIA AI Dynamo Creation Guide

NVIDIA VSS (Video Search & Summarization)

Complex multi-component pipeline:

Main VSS container with Event Reviewer option
VLM (Vision Language Model) component
LLM (Large Language Model) component
Embed (Embedding Model) component
Rerank (Rerank Model) component
Each component can be: new, existing, or remote application

See details → NVIDIA VSS Creation Guide

User Models & Model Transfer

Model Path Configuration

Define model paths on your devices for:

Huggingface models
Ollama models
NVIDIA NIM models

Adding Models

Two methods to add models:

Start Scanning: Automatic detection of models in defined paths
Add Manually: Select specific model directories

Model Transfer

Transfer models between devices on the same network
Resume interrupted transfers automatically
Track transfer progress in real-time
Automatic volume configuration after transfer

Model Deployment

Click Deploy next to any User Model
Select inference engine (vLLM, TensorRT-LLM, etc.)
System redirects to Application Launch interface
Model is automatically configured with correct paths

Learn more: User Models and Model Transfer Guide

VRAM Calculator

Calculate Before You Deploy

Learn more: VRAM Calculator User Guide Avoid deployment failures by calculating VRAM requirements first:

Select Model: Choose from registered models or search Hugging Face
Choose GPU: Select GPU model or use device's actual GPUs
Configure Parameters:
- Quantization (BF16, FP16, INT8, INT4)
- Sequence Length (1K - 256K tokens)
- Batch Size (1 - 512+)
- GPU Count (Standalone mode)
- GPU Memory Utilization (0% - 100%)
- Calculation Type (Overhead vs Activation)
View Results:
- Visual doughnut chart showing memory distribution
- Detailed metrics for each component
- Sufficient/Insufficient status indicator
- Usage percentage bar

Use Cases

Test hardware requirements before purchasing GPUs
Optimize quantization and batch size for existing hardware
Plan multi-GPU deployments
Compare different model configurations

Container Management

Container Operations

Start: Launch stopped containers individually or in groups
Stop: Stop running containers individually or in groups
Delete: Remove containers (type "DELETE" to confirm)
Duplicate: Create new container with same configuration

Batch Operations

Select multiple containers using checkboxes
Delete multiple containers simultaneously
Apply operations to entire container groups

Container Information

View detailed information for any container:

Logs: Real-time log output
Parameters: All configuration settings
Ports: Local and public URLs

Public URL Generation

Generate publicly accessible URLs for any container port
Refresh, deactivate, or reassign ports
Share access to deployed applications

Open Web UI Creation

For LLM engines:

One-click Open Web UI deployment
Optional public URL generation
Direct chat interface with your model

Best Practices

Resource Management

Always maintain Host Reserved values for system stability
Leave 20-30% Free VRAM for unexpected loads
Use VRAM Calculator before production deployments
Set appropriate CPU and RAM limits based on workload

Model Management

Organize models in consistent directory structures
Define model paths on all devices before using User Models
Use meaningful names for custom models
Keep model metadata (quantization, parameters) up to date

Container Deployment

Test configurations with small models first
Use Activation calculation mode for accurate VRAM estimates
Monitor container logs during initial deployment
Create container groups for multi-component applications

Multi-GPU Deployments

Use NVIDIA Dynamo for distributed inference
Configure appropriate worker counts and GPU assignments
Choose router strategy based on workload (KV-Aware for efficiency)
Monitor GPU utilization across all workers

Getting Help

For detailed step-by-step instructions with screenshots and videos:

Application Launch Guide: Standard application deployment
NVIDIA AI Dynamo Guide: Distributed LLM runtime setup
NVIDIA VSS Guide: Video analysis pipeline deployment
User Models Guide: Model management and transfer
Container Management Guide: Container operations and monitoring
VRAM Calculator Guide: Memory requirement calculation

Who is it for?​

What can I do with Application Hub?​

Key concepts​

How does Application Hub work?​

Application Types​

Standard Applications​

LLM Engine Applications​

NVIDIA AI Dynamo​

NVIDIA VSS (Video Search & Summarization)​

User Models & Model Transfer​

Model Path Configuration​

Adding Models​

Model Transfer​

Model Deployment​

VRAM Calculator​

Calculate Before You Deploy​

Use Cases​

Container Management​

Container Operations​

Batch Operations​

Container Information​

Public URL Generation​

Open Web UI Creation​

Best Practices​

Resource Management​

Model Management​

Container Deployment​

Multi-GPU Deployments​

Getting Help​