Charmworks, Inc. was awarded funding as a part of the Department of Energy’s Small Business Innovation Research program to develop and commercially support the ExORun software suite.
Load imbalances constitute a major obstacle for high performance of parallel application, especially at large scale. A single overloaded node can hold back all work because of dependences. In addition, there is a need for tolerating faults, handling variable hardware speeds and dealing with thermal and power issues when running at extreme scale.
ExORun will support across-node runtime adaptivity including dynamic load balancing, based on over-decomposition, for applications written using MPI, MPI+X, Charm++, AMPI, as well as other programming models. The project will enhance dynamic load balancing capabilities in Charm++ to prepare them for Extreme scale computers, and provide interfaces for use in MPI Applications. For MPI applications, the ExORun library will support user-defined chunks of work and data that can be moved across nodes as needed. The library will support various degrees of automation and user-control in instrumentation and load balancing capabilities. It is expected that the library will be used by applications running on upcoming exascale systems.
Eric Bohm, PI for the project at Charmworks, said “This is an exciting project that will allow us to streamline the scaling capabilities of the Charm++ adaptive runtime system, while at the same time increasing its reach to the broad community of MPI applications and programmers.”.
Sanjay Kale, Founder of Charmworks, added: “While these capabilities are being developed with extreme scale machines in mind, they will directly benefit our customers on smaller clusters and cloud-based systems as well. Adaptive control at various levels, including that needed to deal with variability in cloud-based clusters, power and thermal issues, and fault tolerance, in addition to our traditional forte of dynamic load balancing, will all be enhanced by the capabilities developed in this project. Further, the common runtime system is the foundation of Charm++, Charm4Py and AMPI systems, and thus will be beneficial for users of these systems as well. ”
Charm++ software was developed at University of Illinois at Urbana Champaign, and is exclusively licensed for commercialization by Charmworks, Inc. Click here for more information about licensing Charm++.