DCE Product Roadmap¶

Disclaimer

This roadmap reflects current planning directions. Features and timelines are subject to change. Refer to release notes for confirmed deliverables.

	H1 2026	H2 2026	2027+
AI	Inference runtime integration (vLLM / SGLang), domestic GPU support Model asset center MVP (user/project/repo management, model & dataset upload/download, CLI) Pre-integrated domestic model repos (Qwen / GLM / Baichuan) Inference acceleration: multi-level KV Cache, topology-aware scheduling (Kueue / Gang) Training-inference co-location basics AI fault diagnosis (multi-source log correlation + root cause analysis) Predictive alerting (time-series anomaly detection, resource exhaustion warnings)	DCE AI Runtime GA Unified inference API (OpenAI API / Llama Stack compatible) Fine-tuning / LoRA support Multi-modal inference (text-image, audio-video) Model asset center enhancements (remote replication/sync, security scanning, pre-warming, i18n) MatrixHub CNCF submission¹ AI Agent infrastructure Beta (sandbox, memory & context, semantic routing) Fault self-healing (integrated training/inference framework auto-recovery) Alert noise reduction (automatic correlated alert grouping) LLM security (model access control, inference content safety policies)	Distributed inference Training-inference co-location optimization Full-stack AI automation (AutoML + Agent)
Infra	MetaX GPU onboarding (network topology, Lustre GDS) Ascend 910C NPU scheduling (CANN driver) Hygon DCU GPU scheduling AI high-performance storage (Lustre file system) Kueue / Gang Scheduling / LWS / DRA integration HAMi commercial edition integration² containerd enhancements (container disk limits)	Domestic GPU full GA (MetaX / Ascend / Hygon / Biren) MetaX supernode release Supernode solution (8/16-card high-density, GPU sharing scheduler) GPU Operator hybrid scheduling (CPU + GPU + NPU), utilization → 80%+ Distributed storage solution (cloud scenarios)	DPU / NPU unified scheduling Computing network, multi-cluster compute federation InfiniBand topology discovery (via UFM)
Plat	One-click install (Web UI + CLI, auto environment detection) Preflight check framework (plugin-based, network/storage/permission checks) Gateway API migration start (Ingress retirement) Log aggregation enhancements Compute cloud operations platform admin console Compute baseline review & billing model optimization Ghippo admin console UI CSP user two-factor authentication (2FA)	Rolling upgrades (zero-downtime, canary + rollback) Gateway API migration complete Deployment time → 15 min (from ~2 hours) Compute cloud platform enhancements (tenant isolation, inventory management, billing conversion, GPU up/downgrade) Bare-metal deployer (cluster provisioning, automated testing, single-node troubleshooting)	Lightweight kernel, edge-native Self-adaptive platform (auto-tuning + self-healing)
Eco	Kueue / LWS / Gang Scheduling K8s AI/ML SIG contributions Spiderpool DRA implementation, DRANet Spiderpool MetaX GPU support GAIE / NIXL / LMCache inference optimization project participation	MatrixHub Sandbox unifabric 1.0 (network health check, disaster marking, KV Cache sync monitoring) metal-deployer engineering delivery GAIE / NIXL community seats	unifabric Sandbox, InfiniBand support Low-code orchestration, natural language operations

[1] MatrixHub — DaoCloud's open-source model asset center, aiming to be for AI models what Harbor is for container images.
[2] HAMi — Heterogeneous AI computing middleware for GPU sharing and isolation.

Strategic Direction¶

DCE already includes AI Lab (training) and LLM Service Platform (model management & inference). In 2026, we focus on two priorities:

AI Deepening — Complete enterprise inference scenarios, support domestic GPUs, bridge training to inference
Platform Deepening — Operations experience, deployment efficiency, compute management — solidify existing capabilities

DCE 5.0 Existing Capabilities¶

All modules can be upgraded independently without platform-wide downtime.

Module	Capabilities	Docs
Container Management	Multi-cluster management, cluster lifecycle, auto-scaling, Helm apps
Workbench	CI/CD pipelines, GitOps, canary releases
Multi-cloud Orchestration	Cross-cloud resource scheduling & app orchestration
Microservice Engine	Spring Cloud / Dubbo management
Service Mesh	Istio-based traffic governance & observability
Cloud-native Networking	Multi-CNI, network policies, load balancing
Cloud-native Storage	CSI standard, HwameiStor, multi-backend storage
Observability	Metrics / logs / traces, multi-dimensional alerting
Middleware	Redis / MySQL / Kafka / ES / PG lifecycle management
Image Registry	Multi-instance management, Harbor compatible
Global Management	Authentication, multi-tenancy, RBAC, audit
Virtual Machines	KubeVirt, VM management, snapshots, live migration
AI Lab	Training & inference, PyTorch / TensorFlow
LLM Service	LLM deployment & operations, vLLM / SGLang
Cloud-Edge Collaboration	Edge cluster & node management

Operational Assurance¶

Category	Details
High Availability	Multi-replica control plane + etcd cluster, auto-recovery on node failure
Data Backup	etcd snapshots, app-level backup (Velero), cross-cluster disaster recovery
Offline Operation	Fully offline deployment and operation, no external network dependency
Upgrade Rollback	One-click rollback for all version upgrades
Security & Compliance	MLPS Level 3, audit logs, image scanning, model access control
Identity	LDAP / OIDC / enterprise identity platform integration
Technical Support	Documentation + training certification + TAM + 7×24 emergency response

Ecosystem & Partnerships¶

Open Source Contributions: TOP 1 in China and TOP 3 globally for Kubernetes core repository contributions. Active in Istio / Cilium / Spiderpool / HwameiStor and other CNCF projects. Active contributors to Kueue / LWS / Gang Scheduling and other K8s AI/ML SIG projects.

Area	Partners
Chips & Compute	Huawei Ascend, Hygon, Biren, MetaX, NVIDIA
Operating Systems	Kylin, UnionTech UOS
Databases & Middleware	DMDB, OceanBase, TiDB

Industry Coverage: Finance · Manufacturing · Energy · Telecom · Government — serving 500+ enterprise customers.