Compaction of Schedules and a Two-Stage Approach for Duplication-Based DAG Scheduling | FTD0904

Compaction of Schedules and a Two-Stage Approach for

Duplication-Based DAG Scheduling : IEEE 2009

Abstract:

Many DAG scheduling algorithms generate schedules that require prohibitively large number of processors. To address this problem, we propose a generic algorithm, sc, to minimize the processor requirement of any given valid schedule. sc preserves the schedule length of the original schedule and reduces processor count by merging processor schedules and removing redundant duplicate tasks. To the best of our knowledge, this is the first algorithm to address this highly unexplored aspect of DAG scheduling. On average, screduced the processor requirement 91, 82, and 72 percent for schedules generated by plw, tcsd, and cpfd algorithms, respectively. sc algorithm has a low complexity (O(|N|3)) compared to most duplication-based algorithms. Moreover, it decouples processor economization from schedule length minimization problem. To take advantage of these features of sc, we also propose a scheduling algorithm sds, having the same time complexity as sc. Our experiments demonstrate that schedules generated by sds are only 3 percent longer than cpfd (O(|N|4)), one of the best algorithms in that respect. sds and sc together form a two-stage scheduling algorithm that produces schedules with high quality and low processor requirement, and has lower complexity than the comparable algorithms that produce similar high-quality results.

 Introduction

nhanced performance and reduced cost of commodity hardware boosted employment of distributed memory multiprocessor systems (DMMS) in supercomputing fields, such as climate research, physical simulations, image processing, and database systems. In many applications, an efficient parallel version of the application program is not available. Therefore, parallelism is achieved by parti­tioning the program into smaller chunks, called tasks, and scheduling these tasks on target DMMS to minimize the overall execution time, or schedule length.

The goal of partitioning is to represent the program in the form of a directed acyclic graph (DAG) consisting of tasks with appropriate grain size [1], [2]. Task computation and intertask communication times in the DAG are determined according to the target DMMS architecture via estimation and benchmarking techniques [3], [4], [5]. Dependencies among tasks incur inevitable communication overhead when tasks are assigned to different processors of the

  1. Bozdag is with the Department of Electrical and Computer Engineering, The Ohio State University, 333 W. 10th Avenue, 3190 Graves Hall, Columbus, OH 43210. E-mail: bozdag.1@osu.edu.
  2. Ozguner is with the Department of Electrical and Computer Engineering, The Ohio State University, 205 Dreese Laboratory, 2015 Neil Avenue, Columbus, OH 43210. E-mail: ozguner@ece.osu.edu.
  3. V. Catalyurek is with the Department of Biomedical Informatics, The Ohio State University, 333 W. 10th Avenue, 3172B Graves Hall, Columbus, OH 43210 and the Department of Electrical and Computer Engineering, The Ohio State University, 316 Dreese Laboratory, 2015, Neil Avenue, Columbus, OH 43210. E-mail: umit@bmi.osu.edu.

Conclusion

We developed a novel algorithm to minimize the processor requirement of schedules generated by any scheduling algorithm. When applied on a valid schedule, the proposed SC algorithm compacts the schedule on fewer number of processors without increasing the schedule length. Experi­ments on DAGs representing three applications as well as random DAGs verified that SC algorithm dramatically reduces the processor requirement of the schedules gener­ated by CPFD, PLW, and TCSD algorithms. For our test set, the reduction was 91 percent for PLW, 82 percent for TCSD, and 72 percent for CPFD algorithms on average. SC algorithm has a time complexity of O(|N|3), which is smaller than most duplication-based scheduling algorithms.

References:

  1. A. Gerasoulis and T. Yang, “On the Granularity and Clustering of Directed Acyclic Task Graphs,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 6, pp. 686-701, June 1993.
  2. B. Kruatrachue and    T. Lewis, “Grain Size    Determination for Parallel Processing,”    IEEE Software, vol. 5,  no. 1, pp. 23-32, Jan. 1988.
  3. M. Cosnard and M.     Loi, “Automatic Task   Graph Generation Techniques,” Parallel Processing Letters, vol. 5, no. 4, pp. 527-538,Dec. 1995.
  4. M.-Y. Wu and D. Gajski, “Hypertool: A Programming Aid for Message-Passing Systems,” IEEE Trans. Parallel and Distributed Systems, vol. 1, no. 3, pp. 330-343, July 1990.
  5. M. Iverson, F. Ozgüner, and L. Potter, “Statistical Prediction of Task Execution Times Through Analytical Benchmarking for Scheduling in a Heterogeneous Environment,” IEEE Trans. Computers, vol. 48, no. 12, pp. 1374-1379, Dec. 1999.
  6. M. Garey and D. Johnson,, Computers and Intractability, A Guide to the Theory of NP Completeness. W.H. Freeman and Co., 1979.
  7. S. Darbha and D. Agrawal, “Optimal Scheduling Algorithm for Distributed Memory Machines,” IEEE Trans. Parallel and Distrib­uted Systems, vol. 9, no. 1, pp. 87-95, Jan. 1998.
  8. C. Park and T. Choe, “An Optimal Scheduling Algorithm Based on Task Duplication,” IEEE Trans. Computers, vol. 51, no. 4, pp. 444-448, Apr. 2002.
  9. C. Papadimitriou and M. Yannakakis, “Towards an Architecture Independent Analysis of Parallel Algorithms,” SIAM J. Computing, vol. 19, pp. 322-328, Apr. 1990.
  10. I. Ahmad and Y.-K. Kwok, “On Exploiting Task Duplication in Parallel Program Scheduling,” IEEE Trans. Parallel and Distributed

Download Link: 

Compaction of Schedules and a Two-Stage Approach for Duplication-Based DAG Scheduling