DCE Product Roadmap Disclaimer
This roadmap reflects current planning directions. Features and timelines are subject to change. Refer to release notes for confirmed deliverables.
H1 2026 H2 2026 2027+ AI Inference runtime integration (vLLM / SGLang), domestic GPU support Model asset center MVP (user/project/repo management, model & dataset upload/download, CLI) Pre-integrated domestic model repos (Qwen / GLM / Baichuan) Inference acceleration: multi-level KV Cache, topology-aware scheduling (Kueue / Gang) Training-inference co-location basics AI fault diagnosis (multi-source log correlation + root cause analysis) Predictive alerting (time-series anomaly detection, resource exhaustion warnings) DCE AI Runtime GA Unified inference API (OpenAI API / Llama Stack compatible) Fine-tuning / LoRA support Multi-modal inference (text-image, audio-video) Model asset center enhancements (remote replication/sync, security scanning, pre-warming, i18n) MatrixHub CNCF submission1 AI Agent infrastructure Beta (sandbox, memory & context, semantic routing) Fault self-healing (integrated training/inference framework auto-recovery) Alert noise reduction (automatic correlated alert grouping) LLM security (model access control, inference content safety policies) Distributed inference Training-inference co-location optimization Full-stack AI automation (AutoML + Agent) Infra MetaX GPU onboarding (network topology, Lustre GDS) Ascend 910C NPU scheduling (CANN driver) Hygon DCU GPU scheduling AI high-performance storage (Lustre file system) Kueue / Gang Scheduling / LWS / DRA integration HAMi commercial edition integration2 containerd enhancements (container disk limits) Domestic GPU full GA (MetaX / Ascend / Hygon / Biren) MetaX supernode release Supernode solution (8/16-card high-density, GPU sharing scheduler) GPU Operator hybrid scheduling (CPU + GPU + NPU), utilization → 80%+ Distributed storage solution (cloud scenarios) DPU / NPU unified scheduling Computing network, multi-cluster compute federation InfiniBand topology discovery (via UFM) Plat One-click install (Web UI + CLI, auto environment detection) Preflight check framework (plugin-based, network/storage/permission checks) Gateway API migration start (Ingress retirement) Log aggregation enhancements Compute cloud operations platform admin console Compute baseline review & billing model optimization Ghippo admin console UI CSP user two-factor authentication (2FA) Rolling upgrades (zero-downtime, canary + rollback) Gateway API migration complete Deployment time → 15 min (from ~2 hours) Compute cloud platform enhancements (tenant isolation, inventory management, billing conversion, GPU up/downgrade) Bare-metal deployer (cluster provisioning, automated testing, single-node troubleshooting) Lightweight kernel, edge-native Self-adaptive platform (auto-tuning + self-healing) Eco Kueue / LWS / Gang Scheduling K8s AI/ML SIG contributions Spiderpool DRA implementation, DRANet Spiderpool MetaX GPU support GAIE / NIXL / LMCache inference optimization project participation MatrixHub Sandbox unifabric 1.0 (network health check, disaster marking, KV Cache sync monitoring) metal-deployer engineering delivery GAIE / NIXL community seats unifabric Sandbox, InfiniBand support Low-code orchestration, natural language operations
[1] MatrixHub — DaoCloud's open-source model asset center, aiming to be for AI models what Harbor is for container images. [2] HAMi — Heterogeneous AI computing middleware for GPU sharing and isolation.
Strategic Direction DCE already includes AI Lab (training) and LLM Service Platform (model management & inference). In 2026, we focus on two priorities:
AI Deepening — Complete enterprise inference scenarios, support domestic GPUs, bridge training to inference Platform Deepening — Operations experience, deployment efficiency, compute management — solidify existing capabilities DCE 5.0 Existing Capabilities All modules can be upgraded independently without platform-wide downtime.
Module Capabilities Docs Container Management Multi-cluster management, cluster lifecycle, auto-scaling, Helm apps Workbench CI/CD pipelines, GitOps, canary releases Multi-cloud Orchestration Cross-cloud resource scheduling & app orchestration Microservice Engine Spring Cloud / Dubbo management Service Mesh Istio-based traffic governance & observability Cloud-native Networking Multi-CNI, network policies, load balancing Cloud-native Storage CSI standard, HwameiStor, multi-backend storage Observability Metrics / logs / traces, multi-dimensional alerting Middleware Redis / MySQL / Kafka / ES / PG lifecycle management Image Registry Multi-instance management, Harbor compatible Global Management Authentication, multi-tenancy, RBAC, audit Virtual Machines KubeVirt, VM management, snapshots, live migration AI Lab Training & inference, PyTorch / TensorFlow LLM Service LLM deployment & operations, vLLM / SGLang Cloud-Edge Collaboration Edge cluster & node management
Operational Assurance Category Details High Availability Multi-replica control plane + etcd cluster, auto-recovery on node failure Data Backup etcd snapshots, app-level backup (Velero), cross-cluster disaster recovery Offline Operation Fully offline deployment and operation, no external network dependency Upgrade Rollback One-click rollback for all version upgrades Security & Compliance MLPS Level 3, audit logs, image scanning, model access control Identity LDAP / OIDC / enterprise identity platform integration Technical Support Documentation + training certification + TAM + 7×24 emergency response
Ecosystem & Partnerships Open Source Contributions: TOP 1 in China and TOP 3 globally for Kubernetes core repository contributions. Active in Istio / Cilium / Spiderpool / HwameiStor and other CNCF projects. Active contributors to Kueue / LWS / Gang Scheduling and other K8s AI/ML SIG projects.
Area Partners Chips & Compute Huawei Ascend, Hygon, Biren, MetaX, NVIDIA Operating Systems Kylin, UnionTech UOS Databases & Middleware DMDB, OceanBase, TiDB
Industry Coverage: Finance · Manufacturing · Energy · Telecom · Government — serving 500+ enterprise customers.