Everyday, an increasingly diverse Machine Learning (ML) and Deep Learning (DL) algorithms and workloads (collectively referred to as ML models) are introduced. These ML models are introduced at such a pace that researchers are hard-pressed to systematically analyze and study their performance. The major difficulty is the complex nature of these ML models, where performance is impacted by the interplay between frameworks, system libraries, compilers, and hardware platforms (or HW/SW stack). We observe that the inability to rapidly characterize state-of-the art model performance is partly due to the lack of tooling that allow researchers to introspect model performance across the HW/SW stack — while still being agile to cope with the diverse and fast paced nature of the ML landscape.
The current practice of measuring and profiling ML models is cumbersome. It involves the use of a concoction of tools that are aimed at capturing ML model performance characteristics at different levels of the HW/SW stack. Full stack profiling thus means the use of multiple tools and the (hopefully) automatic stitch their outputs. A profiling tool that captures ML model characteristics at different granularities (coupled with automated aggregation and summarization of the results) would boost the productivity of researchers and help understand the model/system performance and identify the bottlenecks. Furthermore it would allow researchers to differentiate between ML models and choose the best given their performance objectives.
This website shows results of across-stack profiling scheme and its integration with XPS — a HW/SW agnostic platform for ML models evaluation at scale. We couple the profiling capabilities with automatic analyses that reveal insights which can not be obtained easily through other tools or methods. Using XPS, we characterized the model/layer/GPU kernel performance of several state-of-the-art models. Through the characterizations, we demonstrate that with XPS researchers can easily introspect model performance at different levels of the HW/SW stack, identify bottlenecks, and systematically compare model or system offerings.