A PDF version is available here.
Friday, October 31, 10pm
See Assignment 5 on Canvas for the Github Classroom link.
ISO C11 w/ POSIX estensions (-std=gnu11)
We recommend developing on a Linux system with GCC (i.e., not macOS with Clang).
This is a pair assignment, but you can work alone, if you so choose.
Submit the contents of your repository via Gradescope. See Deliverables below for what to submit. If you are working with a partner, do not forget to include their name with the submission.
It is unlikely that an autograder will be available for this assignment. Read the requirements and run tests locally.
In class, we cover how the operating system manages memory to create
per-process virtual address spaces. Unix-like operating systems expose
system calls like brk, sbrk, and mmap/munmap to
request virtual address space of the calling process. In most
situations, these system calls are not the most efficient (or
convenient) way to dynamically allocate memory for our programs.
Instead, we often rely on user-space allocators, such as malloc form the C standard library.
A user-space memory allocator (aka malloc) is a library (a set of functions and associated data structures) that handles programmer requests for memory. Most of the work of a memory allocator is to keep track of free memory, give out memory blocks that are available, and to request more memory from the OS when the allocator runs out of its managed memory.
For this assignment, you will be writing your own memory allocator. Writing a custom memory allocator is something you might do if you work on performance sensitive systems (games, graphics, quantitative finance, embedded devices or any application you want to run fast!). Malloc and free are general purpose functions written to manage memory in the average use case quite well, but they can always be optimized for a given workload. That said, a lot of smart people have worked on making malloc/free quite performant over a wide range of workloads. Optimization aside, you might write an allocator to add in cool debugging features, and swap it in as needed.
For this assignment, you will implement a portion of a custom memory allocator for the C language. You will write your own versions of:
This assignment will be the first of two memory allocators you will create this term.
Read Chapter 17 of OSTEP. This covers most of what you need to know about free space management and the workings of a memory allocator conceptually. With the correct understanding, this assignment is relatively quick and easy. Ask questions.
Your objective will be to create three functions in mymalloc.c
mymallocmycallocmyfreePlease read through the following design decisions to help guide you. This is some of the thought process a designer of such a system may go through. There are more concrete specifications following.
Remember that malloc, calloc, and
free are all working with memory allocated on the heap. We
can request memory from the operating system using a system call such as
sbrk. There exist other
ways to request memory, such as mmap, which you will use for
your next memory allocator assignment.
For this assignment, all memory requests MUST use the
sbrk system call.
Once you have retrieved memory, we need to keep track of it. That
means that every time a user uses your malloc or
calloc functions, you will want to know where that memory
exists and how big it is. Thus, we want to keep track of all of the
memory we request in a convenient data structure that can dynamically
expand.
So think about: What data structure could I use?
You may define any helping data structures and functions that assist you in this task. This means you might even have a global variable or two to assist with your implementation. Depending on what data structure you decide to store all of the requested memory in, it may be useful to have additional helper functions for traversing your data structure and finding free blocks/requesting blocks for example.
Programs may frequently allocate and then free memory throughout the
program’s execution. It can thus become very inefficient to keep
expanding the heap segment of our process. Instead, we try to reuse
blocks as efficiently as possible. That is, if I have allocated a block
of memory of 8 bytes, and that 8 bytes gets freed, the next time I call
malloc I can use the previous 8 bytes without having to
make another call to sbrk. There exist several strategies
for searching free memory. The most straightforward ones are first-fit
and best-fit. Both are described in detail, along with other very useful
supporting information in OSTEP
Chapter 17.
In this assignment, we will use the first-fit strategy, which is the simplest to implement.
Here are the default specifications to put everyone on equal footing. You are welcome to diverge if you think you can build something more optimal, but get this basic allocator with the specifications below to work first!
Do not modify malloc.h.
Use an embedded linked list data structure. See OSTEP Chapter 17.
This data structure will keep track of the blocks that you have
pooled within the mymalloc function.
The data structure will need to be accessible by the functions in your library.
You should have some notion of a ‘block’ of memory in your program.
An example is provided here with some fields that may be useful:
typedef struct block {
size_t size; // How many bytes beyond this metadata have been
// allocated for this block
struct block *next; // Where is the next block in the free list
int debug; // (optional) Perhaps you can embed other
// information--remember, you are the boss!
} block_t;Use the sbrk
system call. Your version of malloc (mymalloc
or its helper functions) shall use sbrk. Understand what,
e.g., sbrk(0) and sbrk(10) do before you
start.
The myfree function sets a block of memory to be
free, by setting the free flag in block_t to
“true”. Consider how memory is laid out in the heap and make sure you
are only accessing your struct. Here is a simple diagram:
|block|---actual memory---|block|------actual memory-----|block|--actual memory--|
^ Here is where your struct lives, this is what you want to update.The mymalloc function is returning the actual
memory
|block|---actual memory---|block|------actual memory-----|block|--actual memory--|
^ Here is what you'll return the the programmer as their memory.The first-fit memory allocator looks for the first block available in the linked list of memory. Remind yourself what the trade-off is between the other allocators (e.g. compare to a ‘best-fit’ allocator).
When called, mymalloc must print out "Malloc %zu bytes\n"
using the provided debug_printf function (which is used
just like printf; see the file
debug.h)
When called, mycalloc must print out "Calloc %zu bytes\n"
using the provided debug_printf function (Yes, a proper
implementation will print “Malloc …” followed by “Calloc …”)
When called, myfree must print out "Freed %zu bytes\n"
(using debug_printf)
The malloc.h header defines aliases for these
functions, so you will actually see the tests using malloc,
calloc, and free.
With these print-outs, you can see if they match the original programs.
We will examine your code to confirm you do not use the C library
malloc/calloc/free from
stdlib.h. You should only be using syscalls such as
sbrk to request memory from the operating system.
We have included a Makefile to make your life easier. Here’s a quick overview of the possible targets:
make - compile mymalloc.c to object file,
mymalloc.omake test - compile and run tests in the tests
directory with mymalloc.make demo - compile and run tests in the tests
directory with standard malloc.make help - print available targetsWe have provided some “tests” for you, which exercise your allocator. Most of the tests provide an explanation of what behavior to look for. Passing all the tests without crashing does not guarantee a perfect assignment, but it will give you some confidence your implementation is working. It would be wise to write additional tests to exercise your implementation.
Implement your memory allocator in mymalloc.c and
include any additional .c and .h files your
implementation relies on. For example, you might want to compile your
helper data structure separately.
Commit the code to your repository. Do not include any executables,
.o files, or other binary, temporary, or hidden files.
Once you are done, remember to submit your solution to Gradescope and do not forget to include your partner.
assert() used to test invariantsmymalloc/mycalloc/myfree)
sbrk only called as neededQ: Do I have to reduce the heap ever?
A: No, you do not need to ever make calls to
sbrk(-10) for example.
Q: Can I use valgrind to check for memory
leaks in my program?
A: Likely not, because you are implementing your own
allocator. By default, valgrind is not able to reliably
track calls to sbrk (or mmap). This is not to
say that using valgrind gives no information, but it won’t give you the
same level of detail as if the code had used the system allocator.
Q: So if I cannot use valgrind, what can I
do?
A1: One suggestion is to keep track of how many total
bytes you allocate and how many you mark as free as global
variables.
A2: A second suggestion is that you could use some of
gcc’s compile-time interpositioning tricks to add a function that is
automatically called at the end of the program. This function would
traverse your linked list structure and check to see if any of the
memory in your ‘block’ structure are marked as unfreed.
Q: How will I know my tool is working?
A1: This is a pretty general question, but a tool like
strace can be helpful. strace can be run on
the tests once you have compiled them. strace reports how
many calls you have made to sbrk for example, and you can
test your assumptions to see if that system call is being made too often
(i.e. not reusing blocks that have been previously allocated and could
be used). strace will also confirm that you are NOT using
the system malloc or free.
A2: You could also run your allocator with your linked
list or other data structures and see if it passes the given unit
tests.
A3: Write unit tests to exercise your implementation,
e.g., mallocs of random sizes, interleaved mallocs and frees, large
numbers of mallocs and frees, walking the free list, edge cases…
man is your friend. Check out sbrk,
malloc, calloc, free,
realloc, …assert to check that your
assumptions about state are valid.asserts to check expected results. Use our tests for
queue/vector from Assignment 4 as an example.if-else if-else or a multi-case
switch should
be the only reason to go beyond 40-50 lines per function. Even so, the
body of each branch/case should be at most 3-5 lines long.malloc implementations
(the GNU C library is just one), so you can take a look at it if you are
curious.