Parallel computing often needs to parallelise loop structures, which are a rich source of computing power. To do this, the partitioning phase splits loop iterations into independent tasks and the scheduling phase decides how they will be assigned to processors. The aim of loop partitioning and scheduling schemes is to transform a sequential program into segments of equal workloads. More specifically, compilers partition a number of parallel jobs into several blocks and map these blocks on processors for which the workloads on processors are as even as possible. Ideally, the amount of work on each processor is the same, in which case perfect load balance is achieved. If the distribution of workload among processors is not the same, load imbalance usually exists among processors. This thesis considers the load imbalance achieved by different loop partitioning and scheduling approaches with various classical benchmark applications. In addition, it proposes an extension with respect to the canonical loop scheduling scheme, called combined canonical loop scheduling scheme to reduce load imbalance when dealing with Gaussian type loop nests. First, a description of the loop scheduling problem and several important basic concepts are provided in the introduction. Then the load balancing problem is defined and static and dynamic loop scheduling schemes are analysed. Focusing on the poor performance of loop scheduling schemes when dealing with Gaussian loop nests, we propose a new loop scheduling strategy, called combined canonical loop scheduling, and provide a theoretical analysis about its load balance properties. Finally, a series of experiments is simulated to evaluate combined canonical loop scheduling and four other selected loop scheduling schemes by five different benchmark programs. The results show that the new proposed strategy achieves perfect load balance with respect to depth 2 rectangular, triangular, trapezoid loop nests and nearly perfect load balance with respect to Gaussian loop nests.