Bitfusion FlexDirect is a transparent virtualization layer combining multiple GPU and CPU systems into a single elastic compute cluster to support sharing, scaling, pooling and management of compute resources.
FlexDirect dramatically optimizes existing GPU solutions with 2-4X better utilization (which results in similar cost savings) and offers the ability to dynamically adjust compute resources from fractions of a GPU to many GPUs, with on-the-fly network attaching of GPUs from multiple systems.
FlexDirect Delivers 5 Dimensions of Innovation.
• FlexDirect connects any compute servers remotely, over Ethernet, Infiniband RDMA or RoCE network to GPU server pools.
• FlexDirect attaches and detaches GPUs to workloads in real-time, offering unprecedented utilization of GPUs
• FlexDirect slices GPU to virtial GPUs in any size, allowing multiple workloads to run in parallel
• FlexDirect runs in userspace and proven to work in public cloud, private cloud, on-premise hardware, any hypervisor and container
• FlexDirect has extensions to support FPGAs and ASICs (any OpenCL complaint hardware)
Requiring no operating system, hardware, or code changes, Bitfusion FlexDirect with existing bare metal, virtual machine (VM), Hypervisors, or containerized applications to take full advantage of advanced GPU virtualization.
FlexDirect uses a client-server architecture where servers provide the GPU resources for the cluster, and clients are where end-user applications are run.
- Bitfusion Application Instance (Client): The machines where the end user will be running their application. It can be a GPU instance, but it is not required that it be.
- Bitfusion GPU Instance (Server): The machines which provide GPU resources to the cluster.
There are many flexible configurations which are possible using FlexDirect. However, the most common are: One-to-Many, Many-to-One, and Many-to-Many.
FlexDirect has several runtime optimizations to automatically adapt the best combination of transports: Host CPU Copies, PCIe, Ethernet, InfiniBand, GPUDirect RDMA to achieve superior results. In most cases, virtualized and remotely attached GPUs using Bitfusion FlexDirect match or exceed native GPU performance and efficiency across a variety of machine learning workloads.
GPU utilization in an organization usually follows a trend like below.
FlexDirect allows you to take advantage of underutilized GPU compute cycles more efficiently by allows real-time aggregation and disaggregation of GPUs. For instance, you can keep your workloads on CPU machines most of the time and remote attach a GPU only when the workload needs a GPU, increasing utilization of GPUs by 2-4x.
Not only does FlexDirect allows you attach GPUs to any machine remotely offering reduction in total cost of ownership, it also lets you to slice a single GPU into multiple virtual GPUs of any size, providing increased performance along with increased utilization because of packing more workloads to run in parallel on the same GPU.
FlexDirect improves the unit economics of use cases which may not take advantage of entire nodes and GPUs, such as early test and validation of machine learning algorithms. Fractional GPUs (as small as 1/20th of a GPU) can be assigned at runtime to support many more users than before on the same physical hardware. This affords fine-grained resource control without having to resort to a variety lower-powered devices that would increase the scope and burden of infrastructure management. FlexDirect delivers high performance GPU instances with significantly lower costs and enables users to “right size” spend and capacity to various stages of development and testing.