malt

MALT is a MALloc Tracker to find where and how your made your memory allocations in C/C++/Fortran applications.

C++

MALT : Malloc Tracker

What is it

MALT is a memory tool to find where you allocate your memory. It also provides you some
statistics about memory usage and help to find memory leaks.

It is done to be used on laguages : C, C++, Fortran, Rust.

Python is also supported but currently as prototype status which needs to be enable with --enable-python
at build time.

MALT GUI

Dependencies

MALT depends on the presence of :

binutils (nm and add2line) to extract symbols. Tested version is 2.24 - 2.38.

It optionally depends on :

nodejs (http://nodejs.org/) to run the webview GUI. Tested version is 0.10.30 - 12.22.9.
libelf (http://www.mr511.de/software/english.html) to extract global variable list from executables and libs. Tested version is 0.128 - 0.183.
libunwind (http://www.nongnu.org/libunwind/) as an alternative implementation of glibc backtrace method. Tested version is 1.1 - 1.3.2.

Supported system (known):

Linux (Gentoo / Debian / Ubuntu / Centos / RedHat)

How to install

MALT use CMake for the build system but provide a simple configure wrapper for users
familiar with autotools packaging so you can install by following the procedure :

mkdir build
cd build
../configure --prefix={YOUR_PREFIX}
make
make test
make install

If you want more advance usage, you need to call cmake by yourself so you can install it
by following the procedure :

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX={YOUR_PREFIX}
make
make test
make install

If you are a user of spack you can also install it easily by using :

spack install malt

Into distributions

Gentoo

I provide an overlay containing both MALT & NUMAPROF, you can use it by calling :

# MALT using gentoo overlay memtt :
sudo eselect repository add memtt git https://github.com/memtt/gentoo-memtt-overlay.git
sudo eselect repository enable memtt
sudo emerge -a malt numaprof

Debian / Ubuntu / Centos / RedHat / Fedora / Arch

You can give a look into packaging/README.md if you want to yourself build
packages for those distributions with the embeded scripts.

Build options

MALT build support several options to define with -D option of CMake :

-DENABLE_CODE_TIMING={yes|no} : Enable quick and dirty function to measure MALT internal
performances.
-DENABLE_TESTS={yes|no} : Enable build of unit tests.
-DJUNIT_OUTPUT={yes|no} : Enable generation of junit files for jenkins integration.
-DENABLE_VALGRIND={yes|no} : Run unit tests inside valgrind memcheck and generate XML report.
-DPORTABILITY_OS={UNIX} : Set portability build options to fix OS specific calls.
-DPORTABILITY_MUTEX={PTHREAD} : Set portability build option to select mutex implementation.
-DENABLE_JEMALLOC={yes|no} : Enable or disable usage of jemalloc internally to MALT.

Note about Intel Compiler

MALT is written in C++ so you might possibly encounterd some issue with you build it with GCC and
profile applications built with Intel Compiler. In most cases it should work out of the box without
any issues.

But, I got once an error report about that. In that case, try to compile MALT also with intel compiler
instead if GCC to match the app :

../configure CC=icc CXX=icpc
make

How to use

MALT currently provides a dynamic library you need to preload in your application to
wrap the default memory allocator. It provides two basic instrumentation modes.

By default MALT use backtrace to reconstruct you stack on malloc/free/… calls :

{YOUR_PREFIX}/bin/malt {YOUR_PROGRAM} [OPTIONS]

You can get better performance but less detailed stack by using option
-finstrument-function or similar for your compiler. Then, you need to tel MALT to use
the “enter-exit” stack mode :

{YOUR_PREFIX}/bin/malt -s=enter-exit {YOUR_PROGRAM} [OPTIONS]

The malt script only provides a wrapper to automatically preload a dynamic library
into the executable, you can also do it by hand in cas of issues :

LD_PRELOAD={YOUR_PREFIX}/lib/libmalt.so {YOUR_PROGRAM} [OPTIONS]

Options to compile your program

MALT work out of the box with your program but it required you to compile your program with
debug options (-g) to get access to the source code attached to each call sites.

It might also be better to use -O0 or use -fno-inline to disable inlining which might
provide more accurate call stacks to you.

How to use with MPI

MALT also provides a lightweight support of MPI to generate profile files named with MPI rank ID instead of process ID.
In order to support this you first need to compile the MPI interface on top of your MPI. It will generate a
small library in your home directory.

{YOUR_PREFIX}/bin/malt --prep-mpi [mpicxx]

Caution it will link malt to the current MPI version you are using, if you want to switch to another you will need to
redo the previous command.

Then to profile you mpi application proceed like :

mpirun -np X {YOUR_PREFIX}/bin/malt --mpi {YOUR_PROGRAM} [OPTIONS]

Using webview

You can use the webview by calling command malt-webview as :

malt-webview [-p PORT] [--no-auth] -i malt-YOUR_PROGRAM-1234.json

It will open a server listening locally on port 8080 so you can open your web browser
to connect to the web interface via http://localhost:8080.

At first usage malt-webview will create the password file $HOME/.malt/passwd and ask you a
protection password for http authentification. You can change it at any time with

malt-passwd {USER}

If you are running the view remotely thought SSH you can redirect the ports by using :

ssh -L 8080:localhost:8080 user@ssh-server

To use the webview you need to install the nodeJS package on your system : http://nodejs.org/.

Alternatively you can use a unix socket on the server side and forward it by SSH, it avoids
to expose the 8080 port to anyone on the server as it is protected by the user access rights.

# remote
malt-webview -p /home/myuser/malt.sock -i malt-PROFILE.json
# on your workstation
ssh -L 8080:/home/myuser/malt.sock user@ssh-server

Config

You can provide a config file to MALT to setup some features. This file uses the INI
format. With the malt script :

{YOUR_PREFIX}/bin/malt -c=config.ini" {YOUR_PROGRAM} [OPTIONS]

By hand :

MALT_CONFIG="config.ini" LD_PRELOAD=libmalt.so {YOUR_PROGRAM} [OPTIONS]

Example of config file :

[time]
enabled=true          ; enable time profiles
points=1000           ; keep 1000 points
linar-index=false     ; use action ID instead of time

[stack]
enabled=true          ; enable stack profiles
mode=backtrace        ; select stack tracing mode (backtrace|enter-exit)
resolve=true          ; Automatically resolve symbols with addr2line at exit.
libunwind=false       ; Enable of disable usage of libunwind to backtrace.
skip=4                ; Number of stack frame to skip in order to cut at malloc level
sampling=false        ; Sample and instrument only some stack.
samplingBw=4093       ; Instrument the stack when seen passed 4K-3 bytes of alloc requests.

[output]
name=malt-%1-%2.%3    ; base name for output, %1 = exe, %2 = PID, %3 = extension
lua=true              ; enable LUA output
json=true             ; enable json output
callgrind=true        ; enable callgrind output
indent=false          ; indent the output profile files
config=true           ; dump current config
verbosity=default     ; malt verbosity level (silent, default, verbose)
stack-tree=false       ; store the call tree as a tree (smaller file, but need conversion)
loop-suppress=false    ; Simplify recursive loop calls to get smaller profile file if too big

[max-stack]
enabled=true          ; enable of disable strack size tracking (require -finstrument-functions)

[distr]
alloc-size=true       ; generate distribution of allocation size
realloc-jump=true     ; generate distribution of realloc jumps

[trace]
enable=false          ; enable dumping allocation event tracing (not yet used by GUI)

[info]
hidden=false          ; try to hide possible sensible names from profile (exe, hostname...)

[filter]
exe=                  ; Only apply malt on given exe (empty for all)
childs=true           ; Instrument child processes or not
enabled=true          ; Enable or disable MALT when threads start

[dump]
on-signal=             ; Dump on signal. Can be comma separated list from SIGINT, SIGUSR1,
                       ; SIGUSR2... help, avail (limited to only one dump)
after-seconds=0        ; Dump after X seconds (limited to only one time)
on-sys-full-at=        ; Dump when system memory become full at x%, xG, xM, xK, x  (empty to disable).
on-app-using-rss=      ; Dump when RSS of the app reach the given limit in %, G, M, K (empty to disable).
on-app-using-virt=     ; Dump when Virtual Memory of the app reach limit in %, G, M, K (empty to disable).
on-app-using-req=      ; Dump when Requested Memory of the app reach limit in %, G, M, K (empty to disable).
on-thread-stack-using= ; Dump when one stack reach limit in %, G, M, K (empty to disable).
on-alloc-count=        ; Dump when number of allocations reach limit in G, M, K (empty to disable).
watch-dog=false        ; Run an active thread spying continuouly the memory of the app, not only sometimes.

[python]
instru=true            ; Enable of disable python instrumentation.
stack=enter-exit       ; Select the Python stack instrumentation mode (backtrace, enter-exit, none).
mix=false              ; Mix C stack with the python ones to get a uniq tree instread of two distincts
                       ;(not this adds overhead).
obj=true               ; Instrument of not the OBJECT allocator domain of python.
mem=true               ; Instrument of not the MEM allocator domain of python.
raw=true               ; Instrument of not the RAW allocator domain of python.

[tools]
nm=true                ; Enable usage of NM to find the source locatoin of the global variables.
nmMaxSize=50M           ; Do not call nm on .so larger than 50 MB to limit the profile dump overhead.

Option values can be overridden on the fly with command :

{YOUR_PREFIX}/bin/malt -o "stack:enabled=true;output:indent=true;" {YOUR_PROGRAM} [OPTIONS]

Environnement variables

If you do not use the malt wrapper and use directly LD_PRELOAD you can use the Environnement variables :

MALT_OPTIONS="stack:enabled=true;output:indent=true;"
MALT_CONFIG="config.ini"
MALT_STACK="libunwind"

Analysing sub-parts

If you run on a really big program doing millions of allocation you might get a big overhead, and maybe
you are just interested in a sub-part of the program. You can do it by including malt/malt.h in
your files and use maltEnable() an maltDisable() to controle MALT on each thread. It is also a nice
way to detect leaks of sub-parts of your code.

#include <malt/controler.h>

int main()
{
    maltDisable();
    //ignored
    malloc(16);

    maltEnable();
    //tracked
    malloc(16);
}

You will need to link the libmalt-controler.so to get the default fake symbols when not using MALT.
You can also just provide the two empty functions in your own dynamic library (not static).

If you have some allocation not under your control before your first call you can disable MALT by default
on threads using the filter:enabled option, then enable it by hand.

About stacks

MALT use two ways to rebuild stacks, the default one relies on glibc backtrace but we observe several
segfaults on some intel tools such as Intel OpenMP and Intel MPI so we also provide a more robust
approach based on libunwind if present on your system at build time. You can provide it with :

../configure --with-libunwind=PREFIX

or on cmake :

cmake -DLIBUNWIND_PREFIX=PREFIX ..

You now can use it with malt by using :

malt -s libunwind {PROGRAM}

The alternative relies on function instrumentation by adding prove on start/end for each function.
It can be done by using -finstrument-function on your compiler just as described in “How to use” section
or by using binary instrumentation tools just as explained at the end of this document.

If you want to use the source instrumentation appraoch, you need to recompiler your program
and the interesting libraries with :

gcc -finstrument-functions

Then running malt with :

${YOUR_PREFIX}/bin/malt -s enter-exit {YOUR_PROGRAM}

Tracking stack size

Malt can also track the memory used by stacks over time, but for this support it is required to
enable a compiler flag :

gcc -finstrument-functions {YOUR FILES}

Wrapping a custom allocator

If your application use a custom allocator with a different namespce than the default malloc, free…
you can use the --wrap or --wrap-prefix options.

You can select in details the function by doing:

malt --wrap malloc:je_malloc ./prgm
malt --wrap malloc:je_malloc,free:je_free,calloc:je_calloc,malloc:another_custom_malloc ./prgm

You can also simply use a common prefix for all by using (typically usefull if you embed jemalloc
with a custom symbol prefix):

malt --wrap-prefix je_
malt --wrap-prefix je_,another_custom_

Experimental pintool mode

MALT can also use binary instrumentation mode through pintool
(http://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool)

Please, check usage into src/pintool directory.

Experimental maqao mode

MALT can also use binary instrumentation with MAQAO (http://maqao.org/).

Please check usage into src/maqao directory.

Dealing with big files

In some cases you might get really big files. I get up to 600 MB on one code. The issue is that you
cannot load this kind of file into nodejs due to some limits into the string used to read the file
into json parsor functions.

The first alternative is to try to generate more compressed file by enabling usage of stackTree output
options to store the stacks as a tree into the file. It is more efficient in terms of space (in the 600 MB
case it lower the file to 200 MB) but need an on-fly conversion by the server to get back the supported format.

malt -o "output:stackTree=true" ./PROGRAM

Currently you can still find cases where you cannot load the file into nodejs, I’m working on a workaround.
Please provide me your files if it appends. By compressing it in gzip you will get less than 30-40 MB.

As of 25/07/2024, the JSON are read and processed using streams, and thus, we by-pass the internal hard limit of NodeJs requiring string to be < 512 MB.
However, keep in mind that such big files makes the web interface a bit less responsive. This was tested with files up to 1 GB.

Due to another limitations, you may encounter the following error FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory. You want to modify the heap size limit of nodeJs, with the following options NODE_OPTIONS="--max-old-space-size=<SIZE>" with SIZE in megabytes.

Packaging

You can find packaging instructions inside packaging/README.md.
For quicker use you can use the dev/packagin.sh script which do
the steps automatically.

Installation in non-standard directory

If you install MALT in a directory other than /usr and /usr/local, eg. in your home, you might
be interested by setting some environment variables integrating it to your shell :

export PATH=${PREFIX}/bin:$PATH
export MANPATH=${PREFIX}/share/man:$MANPATH

LD_LIBRARY_PATH is not required as the malt command will use the full path to get access the
internal .so file.

Profiling python

Note: This is currently experimental.

First you need to build MALT by enabling python support : --enable-python and you will need to have
the python headers (package python3-dev or libpython3-dev or python3-devel) on your plateform.

In practice MALT after being built will be able to run over various versions of python without beeing
rebuilt as long as they follow the standard API which is currently stable. It should also work on the
python delivered by Anaconda.

Supported version are currently python from version 11.

Due to large number of memory allocations in python MALT currently have a large overhead over python.
There is in consequence several way to instrument your app which I sort in overhead increasing order.

# Use default mode (python-only)
malt-python ./script.py
# profile without stacks
malt-python --profile python-no-stack ./script.py
# An approximativ method by sampling instead of tracking each stack (faster but not exact)
malt-python --profile python-sampling ./script.py
# Similar but with less samples
malt-python --profile python-sampling-10M ./script.py
# Similar but with less and less samples
malt-python --profile python-sampling-20M ./script.py
# profile considering only python stacks (C is mapped under python)
malt-python --profile python-only ./script.py
# Full instrumentation of Python + C
malt-python --profile python-full ./script.py

Note: The malt-python is just a wrapper over malt command profiding a different default
profile sepcific for python, you can also use directly the malt command.

Note: The malt-python command is a temporary workaround, it might disapear in future.

Similar tools

If you search similar tools all over the web you might find:

Heaptrack: A Heap Memory Profiler for Linux: KDE/heaptrack: https://github.com/KDE/heaptrack
Memoro: A detailed Heap Profiler : https://epfl-vlsc.github.io/memoro/
Memtrail: https://github.com/jrfonseca/memtrail
MTuner: https://milostosic.github.io/MTuner/
Profiler provided with google allocator: Google Heap Profiler
Valgrind memcheck
Valgrind massif: Valgrind massif with Massif visualizer
Dr. Memory
Commercial tool, Parasoft Insure++
Commercial tool, Unicom PurifyPlus (previously IBM)
Tau is more a communication profiling tool for HPC apps, but it offers a memory module
Similar approach than MALT for the backend: IgProf
A debug malloc library: Dmalloc
Profiling and leak detection: MemProf
Malloc count
mpatrol
Tracing tool for parallel programs: EZTrace
Find Obsolete Memory: FOM Tools
Memray: A memory profiler support C & python. https://bloomberg.github.io/memray/
Scalene: A perf and memory profiler for C & python : https://pypi.org/project/scalene/

If ever I missed new ones, you can also look on the repos of this person keeping an up-to-date list:
https://github.com/MattPD/cpplinks/blob/master/performance.tools.md

Parallel allocators

If you search some parallel memory allocators, you can find those one on the net:

Jemalloc (facebook, firefox)
TCMalloc (google)
Hoard
Lockless allocator
MPC memory allocator (look into mpcframework/MPC_Allocator)
mimalloc

License

MALT is distributed under CeCILL-C license (LGPL compatible).

To cite

If you publish about MALT, you cite this research paper as reference :

Sébastien Valat, Andres S. Charif-Rubial, and William Jalby. 2017. MALT: a Malloc tracker.
In Proceedings of the 4th ACM SIGPLAN International Workshop on Software Engineering for
Parallel Systems (SEPS 2017). Association for Computing Machinery, New York, NY, USA,
1–10. https://doi.org/10.1145/3141865.3141867

Discussion

You can join the google group to exchange ideas and ask questions : https://groups.google.com/forum/#!forum/memtt-malt.