UQ Students should read the Disclaimer & Warning

Note: This page dates from 2005, and is kept for historical purposes.

COMP3300 – Assignment One – Threaded Merge Sort

My Result 70 out of 100
Comments
  • bonus early submission +5%
  • does not maximise parallelism -15%
  • cascading merge implementation (cap 80%)
Class Average 62
Class Median 70
Class Standard Deviation 28.7

Marks Distribution

Time constraints prevented me from implementing a parallel merge solution, as I was away for a large part of this assignment at a wedding. Then, due to an inadvertent design error, the loop that launches read and sort threads will wait for each read/sort thread pair to terminate before launching the next pair, thus entirely defeating the parallelism aspect of this assignment.

The purpose of this assignment is to show your understanding of threads and of C programming.

You are required to use existing code as a base for developing a multithreaded multifile sort. You have the option of developing a simplified version which sorts exactly 4 files, if you find the general case of an arbitrary number of files too hard.

Multifile here means that you should sort multiple files into one overall sorted result, and you should use the given printBuffer function to produce the output.

The notion is that because file reading is so slow, it you are sorting multiple files, it would be nice to start sorting as soon as the smallest file has been read, and to overlap as much work as possible with waiting for file reads.

The provided code shows how to read 1 file, sort it and print it, using threads. You need to work out how to merge multiple sorted files (in memory-based data structures), using threads.

Detailed Requirements

  1. Your program should use the same files as are supplied, and you should only modify the files you are told to modify.
  2. A command line is supplied to compile the program. Whatever development environment you use, your program must compile with the given command line on the student UNIX machines (Solaris). The provided code has been tested on Linux and Solaris, but your solution must run on Solaris.
  3. Your program should run with the following command:
    ./launch_parallel_sort data
    where the file name data can be replaced by an arbitrary number of files.
  4. The input data to your program should be any number of text files named on the command line.
  5. The output of your program should only consist of the output produced by calling printBuffer once as in the supplied code; the output should be the contents of all the files named on the command line, individually sorted then merged (i.e., the original data sorted into one data structure).
  6. Your program should be written as follows (alternative organizations will be accepted if they are justified as simpler or more efficient, but the basic principle of doing work as soon as the data is available by using multiple threads must apply):
    1. it should have one thread for each file read (using the supplied code for reading a file)
    2. it should have one thread for each sort (using the supplied sorting code)
    3. it should have one or more threads for merging sorted results: each of these threads should merge two sorted results into one; and merges may be cascaded to handle arbitrary numbers of files
  7. You need to write code to do merging and code to combine reading, sorting and merging, using the given thread primitives.
  8. You should only use the following threading primitives, and not other parts of Pthreads or other libraries or APIs:
    1. init_threads
    2. launch_thread
    3. join_thread
  9. You are supplied with 4 data files for testing, but your final code will be tested with other files.
  10. You should only modify the following files; all other files will be used unaltered when we test your code:
    launch_parallel_sort.c
    parallel-sort.c
    if you modify any other files or  submit files with different names to these your assignment will be graded as “doesn't  compile”.
  11. Your program should conform to the following formatting requirements:
    1. All code at the same level should be left-aligned; an increase of level by introducing “ { ” should result in an indentation of 2 - 4 spaces; opening a new “ { ” block should usually be done at the end of the previous line and the closing “ } ” should be on a line of its own unless readability dictates otherwise (for example, it is permissible to write “ } else { ”); if in doubt ask.
    2. Type names should start with an initial capital; all other names with an initial lowercase letter. For readability you may use underscores or capitals within a name. (Violations of these guidelines in the original code should not be altered.)
    3. Any types or variables specific to one compilable (“ .c ”) file should appear at the top of the file after the “ #includes ”.
    If in doubt about any formatting requirements, please ask: these guidelines can be extended.
  12. There will be a bonus for early completion: any assignment handed in 24 hours or more before the due date, which achieves at least 50% of available marks, will receive a 5% bonus. The maximum available for the assignment therefore is 105% of available marks. Recall that if you fail to achieve 50% overall for assignments, your exam alone will be used to determine your final grade, but with a maximum grade of 5.

launch_parallel_sort.c
parallel-sort.c

launch_parallel_sort.c

parallel-sort.c

Code © Copyright 2004 Ned Martin

23-Apr-2004