navigation

XPS Results

The past few years have seen a surge of using Machine Learning (ML) and Deep Learning (DL) algorithms for traditional HPC tasks such as feature detection, numerical analysis, and graph analytics. While ML and DL enable solving HPC tasks, their adoption has been hampered due to the lack of understanding of how they utilize systems. Optimizing these algorithms requires characterizing their performance across the hardware/software (HW/SW) stack, but the lack of simple tools to automate the process and the reliance on researchers to perform manual characterization is a bottleneck. To alleviate this, we propose an across-stack profiling scheme and integrate it within XPS --- a hardware and software agnostic tool for evaluating and benchmarking ML/DL at scale. We demonstrate XPS's ability to characterize state-of-art ML/DL models and give insights that are only possible obtained by performing across-stack profiling.

Everyday, an increasingly diverse Machine Learning (ML) and Deep Learning (DL) algorithms and workloads (collectively referred to as ML models) are introduced. These ML models are introduced at such a pace that researchers are hard-pressed to systematically analyze and study their performance. The major difficulty is the complex nature of these ML models, where performance is impacted by the interplay between frameworks, system libraries, compilers, and hardware platforms (or HW/SW stack). We observe that the inability to rapidly characterize state-of-the art model performance is partly due to the lack of tooling that allow researchers to introspect model performance across the HW/SW stack — while still being agile to cope with the diverse and fast paced nature of the ML landscape.

The current practice of measuring and profiling ML models is cumbersome. It involves the use of a concoction of tools that are aimed at capturing ML model performance characteristics at different levels of the HW/SW stack. Full stack profiling thus means the use of multiple tools and the (hopefully) automatic stitch their outputs. A profiling tool that captures ML model characteristics at different granularities (coupled with automated aggregation and summarization of the results) would boost the productivity of researchers and help understand the model/system performance and identify the bottlenecks. Furthermore it would allow researchers to differentiate between ML models and choose the best given their performance objectives.

This website shows results of across-stack profiling scheme and its integration with XPS — a HW/SW agnostic platform for ML models evaluation at scale. We couple the profiling capabilities with automatic analyses that reveal insights which can not be obtained easily through other tools or methods. Using XPS, we characterized the model/layer/GPU kernel performance of several state-of-the-art models. Through the characterizations, we demonstrate that with XPS researchers can easily introspect model performance at different levels of the HW/SW stack, identify bottlenecks, and systematically compare model or system offerings.