Medhavi - A Framework for Making Infrastructure Intelligent for HPC workloads

What’s the problems in infrastructure and why to address it –

It has been found extremely challenging to tune the infrastructure optimally while the quest for faster, cheaper and better infrastructure depends on multiple key performance indicators (KPIs). The illusion of capacity being infinite based on virtualization and hidden to workloads does not really address the constraints of existing and evolving infrastructure elements in compute, network and storage domains. To overcome the pains of this aspect the constraints have to be based on what Infrastructure is available on the Rack and shelf or on the network paths and storage elements in a given domain (can be cross-domain as long as there is a sharing arrangement and means to share it.)

Thus, the first part of the problem is to state what is the workload.

Workloads are applications and functions that are executed with real-time constraints over any given infrastructure. Workload types are AI, ML, DL, MEC, NFVi, IoT, 5G, etc. The workload can have typical profiles like ML Profile for speech or image recognition. There are few typical workloads defining parameters like Processing Cores / Clock speeds, Memory BUS Bandwidth, Memory Hierarchy, Network throughput/latency, IO-MMU transfer bandwidth, and IO operations per seconds (IOPs) that need optimization to effectively use the Infrastructure at hand.

Firstly, in our industry most users don’t have the right information for planning for their infrastructure and secondly, we are not using it intelligent or efficient enough to run those workloads efficiently. The need for infrastructure is changing every day for every industry or domain e.g. Edge needs some kind of infrastructure vs AI/ML/BigData requires the different level intensity of infrastructure. Although we do define reference architectures for every workload, still we fail to match the need of application programs effectively as apps requirements fluctuate over time and traffic peaks and demands more compute power, some need more memory, IO, etc. How a user can understand the need of application and configure the infrastructure intelligently in some way to achieve 80% and rest we possibly can improve using AB testing as done for Video streaming workload by NetFlix.

How to find a solution and what approach Medhavi provides in multi-component infrastructure –

The problem is we have so a variety of infrastructure providers and each defines the best suitable option for workloads that’s applicable to their configurations and it costs end users more. This problem exists for decades and the process involved is complex, manual and not automated. We have sometimes overprovisioned cores in systems, but they are not utilized, or we do not have visibility on what kind of devices are accessing these back end systems and what kind of devices we need to run the workload to execute workload faster and that’s where Medhavi comes into the picture. The idea here is Medhavi helps provide best-known configurations recommended for workloads and based on availing those infrastructures and use the compute workload manager to configure, execute and time them for best results. The additional component is to organize an AB testing on live workloads to improve the performance.

The meaning of “Medhavi” is “Intelligent” a Sanskrit language term. Medhavi architecture evolves around simple compute workload managers and a simple concept of jobs how they are placed and scheduled using Open Infrastructure is being attempted. While requirements evolve following is the baseline architecture.

The multiple modules of Medhavi are API Controller, Device Engine Agent Manager, Engine Agent, Cluster Engine, ML feedback Engine, etc. Each of these engines is a different module under Medhavi which will provide different feature sets. All these engines are continuously communicating with each other via a message queue.

Currently, PoC is attempting to use the Kubernetes cluster API to evaluate intelligence at the cluster level with the best-known configuration. Once the Medhavi team is able to set the AB testing for the given workload we will have a better understanding of what and how the API will evolve and this is an attempt to innovate on capturing and mapping intelligence from workload to emerging devices. We will talk about device requirements and emerging agents along with how it is being conceived in future blogs.

Also, we will be presenting, showcasing a demo of PoC and releasing Medhavi as an open-source in an upcoming Open Infra Summit in Shanghai 2019. My co-presenters are Prakash Ramchandra (Dell, Principal Architect – Cloud and Comm. Solutions), Sujata Tibrewala (Intel, Networking Technology Evangelist) & Jayanthi Gokhale (Intellysys ASP).

Talk to our Experts for more details

Technical Documentation Review and Tips

Prema Kolhar February 29, 2024 comments off Reading Time: 6 minutes

Technical reviews are vital for effective and quality documentation. To make this happen, have documentation and its reviews listed as one of the deliverables – just like development or testing. This will place priority on the process, and ensure everyone involved understands the importance of proper and thorough reviews.

Technology Trends 2024- The CXO perspective

Somenath Nag December 18, 2023 comments off Reading Time: 4 minutes

In the rapidly evolving landscape of 2024, technology trends are reshaping industries and redefining business strategies. From the C-suite perspective, executives are navigating a dynamic environment where artificial intelligence, augmented reality, and blockchain are not just buzzwords but integral components of transformative business models. The Chief Experience Officers (CXOs) are at the forefront, leveraging cutting-edge technologies to enhance customer experiences, streamline operations, and drive innovation. This blog delves into the strategic insights and perspectives of CXOs as they navigate the ever-changing tech terrain, exploring how these leaders are shaping the future of their organizations in the era of 2024’s technological evolution.

[Infoblog] Generative AI Shaping Future Industries

Vaishnavi Kulkarni November 3, 2023 comments off Reading Time: < 1 minutes

Generative AI is at the forefront of innovation, harnessing the power of machine learning algorithms to create new and original content, from images and music to entire virtual environments. This infographic depicts how Gen AI is evolving industries and shaping its future.

Enhancing vCenter Capabilities with VMware vCenter Plugins: A Deep Dive

Mohsin Khazi July 28, 2023 comments off Reading Time: 4 minutes

vCenter Server is one of the most powerful tools in VMware’s product portfolio, enabling efficient management of virtualized environments. One of the most used features in vCenter is the vCenter plugin, which extends the capabilities by providing custom features such as 3rd Party system discovery, and provisioning, providing a unified view, allowing administrators to manage vSphere, and 3rd Party systems seamlessly.

Related Posts

A Deep Dive into 5G Service-Based Architecture (SBA)

Technical Documentation Review and Tips

Technology Trends 2024- The CXO perspective

The Winds of Technology Blowing into 2024

[Infoblog] Generative AI Shaping Future Industries

Enhancing vCenter Capabilities with VMware vCenter Plugins: A Deep Dive