GLibc malloc internal (1): arena, bin, chunk and sub heap

To understand how GLibc manages memory internally, a few key concepts/data structures should be introduced.

Arena: Arena is the top level memory management entity. There are two types of arenas. Main arena covers the traditional heap area: the space between start_brk and brk for a process from kernel point of view, only one main arena exists for a process.  Non-main arena manages the memory fetched from kernel via mmap() system call,  there could  be 0 to 2*(number of cpu cores) such arenas based on process threads usage.

chunk: chunk is the bottom entity in the management hierarchy. It could have different sizes. Roughly there are 4 types of chunk: regular chunk under bin, top chunk directly controlled by arena, last remainder chunk which is actually stored in unordered bin and is a technique to improve cache hit rate,  and mmapped chunk which bypass bin and arena and is standalone.

Bin: For regular chunks in free state,  they are stored in linked list and managed via bin. We’ll explore how fast bins, unsorted bin, small bins and large bins are used in more detail a little later.

main_arena

 

Sub heap: For non-main arena, each time it gets memory from kernel in block of 1M bytes on 32bit system (HEAP_MAX_SIZE), and the block is called sub heap. Note that one non-main arena could have many number of sub heaps.

non-main_arena

With the management hierarchy in mind,  let’s first check how arena data structure is defined:

malloc_state

As expected, it needs to support multi-threaded application, and has “mutex” to serialize the access.  The “flags” indicates whether this arena is contiguous or not and the availability of fast bin chunks, non-main arena is composed of sub-heaps and  always has the NONCONTIGUOUS_BIT set.  References to top and last_remainder chunks and fast/normal bins could be found in the structure.  It is worth mentioning that with “next” pointer, all arenas are linked together, while “next_free” is for linking all currently idle arenas together so a thread in need may quickly fetch one.

Chunk:

chunks

Chunk data structure itself looks simple, but it is probably the most flexible one in malloc function and has a very subtle design in the representation of each field under different states.

Chunk is at least 8 bytes aligned,  it leaves us the three lowest significant bits in “size” field as states flag.  Bit 0 (P) indicates whether the previous chunk is in use.  If bit 1 (M)is set,  this chunk is allocated from mmap region directly instead of arena. Bit 2 (A) indicates which type of arena the chunk belongs, 1: non-main arena, 0:main arena.

`boundary tag’ method is being used by chunk structure.  Example:  the prev_size is only meaningful when the previous chunk is free (bit 0 unset) and the prev_size field is located at the end of previous chunk memory region.  The prev_size field should be accessed by chunk structure when bit 0 is set, it could already have real user data owned by the previous chunk at that address.  Similar arrangement for fd and bk fields, they are valid only when current chunk is in free state. While size is the only field in chunk structure that matters when chunk is in allocated state.

Bin:  The free chunks are stored with bins which are pointers to malloc_chunk linked lists.  Top chunk is the only exception which is under arena directly.   Fast bins woks as cache, it has default max size of 64 bytes starting at 16 bytes with stride of 8 bytes, and the max size could extend to 88 bytes in maximum.  Whenever a free memory api is called and the memory block size is within that range, the memory block is put into fast bin.  For malloc, fast bins are also the first choice.   Fast bins works in last in first out mode and CAS atomic operation is used for the linked list push/pop processing.  unsorted bin is the second one to be checked for free and malloc processing, it doesn’t have size constraint, memory blocks are put there and rearranged into other normal sized bins or distributed to application directly in different scenarios.

sub-heap:  In malloc function, it is represented by heap_info data structure.  The comments with it should be clear enough.

heap_info

 

 

Advertisements

One thought on “GLibc malloc internal (1): arena, bin, chunk and sub heap

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s