A PDF version is available here.
Friday, November 14, 10pm
See Assignment 7 on Canvas for the Github Classroom link.
This is a pair assignment, but you can work alone, if you so choose.
Submit the contents of your repository via Gradescope. See Deliverables below for what to submit. If you are working with a partner, do not forget to include their name with the submission.
There will be no autograder for this assignment. Read the requirements and run tests locally. We will evaluate the ability of your implementation to sort large enough inputs correctly using the correct number of threads.
In this assignment, we will turn a single-threaded naive implementation of merge sort into a multi-threaded one.
The starter code contains an implementation of merge sort in the file
msort.c. The implementation is based directly on the
top-down example from the above mentioned Wikipedia article.
Your first task is to make sure you understand how this algorithm works.
Do ask questions if you feel you are missing something. Note that the
implementation works with two arrays: the input array and a “target” or
“helper” array. The target array will contain the resulting sorted
array. However, this implementation also modifies the original
array.
The msort program takes a single argument which is the
number of long integers to be read from standard input. Once it reads in
the given number of elements and/or the standard input is closed, the
array is sorted and the result is printed to standard output.
Your main task is to use the POSIX
threads (pthread) library routines to turn the provided merge sort
implementation into a concurrent one that improves on the performance of
the single-threaded version. The Wikipedia article contains some
pseudocode to get you started under Merge sort with parallel
recursion. It shouldn’t take too much effort. Implement your
version in tmsort.c, which initially contains the same
implementation as msort.c.
The concurrent implementation is parametrized in the number of
threads it should use. This parameter is taken from the environment
variable MSORT_THREADS, which is read in by the starter
code (take a look!). Your implementation should not use more than the
given number of threads, but should use exactly that number on a large
enough input. E.g., when MSORT_THREADS=10 and the input
array has 1 000 elements, the
implementation should use exactly 10
threads (including the main thread). You can set environmental variables
globally for the current shell using export (see How
to Set and List Environment Variables in Linux, or you can just take
a look how we do it in the Examples below.
Your second task is to perform a few experiments on at least two different machines that are capable of compiling and running pthread-based code. These include:
Running directly on a physical machine available to you should give you the most impressive speed-ups.
In experiments.md, write a brief summary of your
experiments with large enough inputs. Use the provided file as a
template. Write for each machine:
Specs (CPU including the number of cores, memory, disk capacity, OS)
Input data size and how you created it (see [Hints & Tips]). Use a large enough number of elements, so that the single-threaded implementation takes at least 10 seconds to sort them.
The approximate number of processes running before you start each
experiment (obtainable by running ps aux | wc -l)
Run experiments for different thread counts (1, 2, number
of cores, double the number of cores, …), one after another. For each
thread count, run the sort at least 4 times with the same arguments/data
set. Note how much time the sorting portion took (look for the line
"Sorting completed in N seconds.").
Keep increasing the thread count and figure out at which point adding new threads does not translate to (significantly) improved performance. How does this number relate the number of cores on the machine you run the experiments on?
tmsort.c. Include any
additional .c and .h files your implementation
relies on. However, you shouldn’t need to implement many additional
functions and/or data structures that would imply separate compilation.
experiments.md.
Commit the code and the experiments to your repository. Do not
include any executables, .o files, or other binary,
temporary, or hidden files.
Once you are done, remember to submit a ZIP file with your solution to Gradescope and do not forget to include your partner.
MSORT_THREADS environment variableman is your friend. Check out pthreads,
pthread_create, pthread_join, …nproc command to get the number of
cores available on your machine. On Linux, reading the file
/proc/cpuinfo will give you detailed information about the
processor on your system.pthread_create only takes
a single void *
argument. To pass custom arguments, create a “helper” struct and pass a
pointer to it. Depending on where you are waiting for the thread to
join, you might need to allocate it on the heap.pthread_mutex_lock and
pthread_mutex_unlock to ensure consistency.numbers, to generate
M - N random numbers between N and
M (inclusive) by running ./numbers N M. E.g.
./numbers 1 1000 > thousand.txt generates 1, 000 random numbers and writes them to
thousand.txt.tmsort consistently returns the same
result as msort on the same input. You can check that by
running
diff <(./msort thousand.txt) <(./tmsort thousand.txt).make diff-N, where N is the size of input.
Example: make diff-1000.make clean-temp regularly to attempt and clean these files.
Worst comes to worst, a restart should clean them up too.tmsort is consistent with
msort, you might want to redirect stdout to
/dev/null to avoid printing the result when doing timing
experiments: ./tmsort input.txt > /dev/null. Timings are
printed to stderr so they will still be visible.assert to check that your
assumptions about state are valid.asserts to check expected results. Use our tests for
queue/vector from Assignment 4 as an example.if-else if-else or a multi-case
switch should
be the only reason to go beyond 40-50 lines per function. Even so, the
body of each branch/case should be at most 3-5 lines long.Here are some examples of running our implementation of
./tmsort. The first line of input is the number of elements
to follow.
Basic usage, input from terminal.
$ ./tmsort
Running with 1 thread(s). Reading input.
5
5
4
3
2
1
Read 5 items
Array read in 3.983371 seconds, beginning sort.
Sorting completed in 0.000003 seconds.
1
2
3
4
5
Array printed in 0.000061 seconds.Pre-generating an input file using the supplied
numbers script (works on Linux and macOS).
$ /numbers 1 10 > ten.txt
$ cat ten.txt
10
9
2
8
4
5
6
1
7
10
3
$ ./tmsort ten.txt
Running with 1 thread(s). Reading input.
Read 10 items
Array read in 0.000064 seconds, beginning sort.
Sorting completed in 0.000002 seconds.
1
2
3
4
5
6
7
8
9
10
Array printed in 0.000051 seconds.Varying the number of threads.
$ ./numbers 1 100000000 > hundred-million.txt
$ wc -l hundred-million.txt
100000001 hundred-million.txt
$ MSORT_THREADS=1 ./tmsort hundred-million.txt > /dev/null
Running with 1 thread(s). Reading input.
Read 100000000 items
Array read in 8.398152 seconds, beginning sort.
Sorting completed in 19.341773 seconds.
Array printed in 5.918733 seconds.
$ MSORT_THREADS=2 ./tmsort hundred-million.txt > /dev/null
Running with 2 thread(s). Reading input.
Read 100000000 items
Array read in 8.312380 seconds, beginning sort.
Sorting completed in 10.252641 seconds.
Array printed in 6.111131 seconds.
$ MSORT_THREADS=4 ./tmsort hundred-million.txt > /dev/null
Running with 4 thread(s). Reading input.
Read 100000000 items
Array read in 8.326250 seconds, beginning sort.
Sorting completed in 6.216509 seconds.
Array printed in 5.979610 seconds.