Small Language Models as Control Planes: Designing Cost-Efficient GenAI Orchestration Layers for Enterprise-Integrated Digital Workflows

Siva Hemanth Kolla

Article

Small Language Models as Control Planes: Designing Cost-Efficient GenAI Orchestration Layers for Enterprise-Integrated Digital Workflows

PDF

Published: 2026-02-03

44 PDF: 74

Authors

Siva Hemanth Kolla, Appa Rao Nagubandi

🏷️ Keywords +

Generative AI Orchestration, Enterprise GenAI Control Planes, Cost-Efficient Model Serving, Small Language Models (SLMs), Large Language Model Optimization, Model Sparsity and Quantization, GenAI Workload Characterization, Intelligent Request Routing, Model Caching Strategies, Personalized GenAI Services, Enterprise AI Cost Governance, Control-Plane Decision Logic, Distributed GenAI Architectures, Model Selection and Policy Enforcement, AI-Driven Service Digitalization, Adaptive Model Configuration, Execution Cost Optimization, Latent Enterprise GenAI Use Cases, Scalable GenAI Infrastructure, Next-Generation AI Orchestration Frameworks

📁 Abstract +

Generative AI applications such as chatbots and text-to-image systems create demand for GenAI models that address the common information needs of diverse stakeholders in a responsive and personalized manner. Yet the effort to train, host, and serve these models can incur substantial cost and complexity. Many large language models can answer a wide range of questions, but risk being underutilized for specific enterprise workloads. At the same time, enterprise-integrated GenAI services are supporting the digitalization of business processes at an unprecedented scale, revealing latent use cases for specialized models or adjusted configurations of the same model that reflect the cost profiles of these systems. These factors suggest that deploying smaller models to manage the GenAI orchestration layer across an enterprise might yield significant cost savings. A control plane design based on the concept of GenAI orchestration is proposed, along with a set of cost-efficiency principles for implementing this functionality. Control planes are responsible for decision-making and policy enforcement across a distributed system. Making cost-effectiveness an explicit design goal when architecting an orchestration layer introduces additional considerations beyond those that typically inform the design of control planes. Model size, sparsity, quantization, caching, and workload characterization shape the trade-offs governing the overall cost of model execution, create opportunities for realizing cost savings, and identify workload patterns that can further inform cost-saving measures.

📊 Downloads +

Article

Small Language Models as Control Planes: Designing Cost-Efficient GenAI Orchestration Layers for Enterprise-Integrated Digital Workflows

Authors

FOR AUTHORS

FOR REVIEWERS

INDEXING