You are here: Home ยป Blog

Cheat sheet: understanding the pmap(1) output

Posted on Friday, July 03 2015 at 09:17 | Category: Linux | 2 Comment(s)

pmap(1) can be used to list the individual address areas which are mapped into a process. It essentially reads the /proc/$pid/smaps file and represents the data in a more readable format. On Linux, there is the command line option -XX which displays all information the kernel provides (the output might change when newer kernels provide different metrics).

Lets consider the following simple C program:

#include <stdio.h>

int main(int argc, char *argv[]) {
    getchar();
    return 0;
}

We can now launch this application (called "minimal") and call pmap with its process id:

$ ./minimal & pmap -XX $!

As a result, we get the output as shown in this cheat sheet (click on the image for a larger, readable version):

One interesting metric is the PSS value, also known as "Proportional Share Size". This is the amount of memory which is private to the mapping, plus the partial amount of shared mappings of this process. For example, if the process has mapped 100 KiB of private memory, another 200 KiB of shared memory which is shared between two processes and another 150 KiB which is shared between three processes, the PSS is calculated like 100 KiB + 200 KiB / 2 + 150 KiB / 3 = 250 KiB. This can be directly observed when starting the "minimal" application more than once. Lets examine the first mapping which is reported my pmap when the application is started once (this is the code section of the executable) (only showing the first few columns here):

$ ./minimal & pmap -XX $!
         Address Perm   Offset Device   Inode Size Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty 
        00400000 r-xp 00000000  08:01 3948320    4   4   4            0            0             4             0

The application now continues to run, and we can simply start it once again and dump the mappings of this new process:

$ ./minimal & pmap -XX $!
         Address Perm   Offset Device   Inode Size Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty
        00400000 r-xp 00000000  08:01 3948320    4   4   2            4            0             0             0

Here, we can observe two things:

  • The Pss value for the new process is 2, not 4 - this is because the code section is shared between two processes, the one which we started first and the one which we started second
  • The Private values are now 0, while the Shared values are now 4 - we are now sharing one page (4 KiB) between the two processes.

Furthermore, if we use pmap to examine the mapping of the first process again (assumed that its process ID was 13944), we see that also for this first process the values have changed:

$ pmap -XX 13944
13944:   ./minimal
         Address Perm   Offset Device   Inode Size Rss Pss Shared_Clean Shared_Dirty Private_Clean Private_Dirty
        00400000 r-xp 00000000  08:01 3948320    4   4   2            4            0             0             0
Note that memory which can be shared (e.g. Code from a shared library) but which is mapped only into this particular process is counted as private, unless it will be mapped in at least one additional process.

Understanding free and top memory statistics [Update]

Posted on Friday, June 12 2015 at 12:15 | Category: Linux | 1 Comment(s)

Both top and free can be used to gather basic information about memory usage, but each of them reports the statistics in a slightly different way which might not be directly obvious. An example output of free, using the -m switch to report numbers in MiB instead of KiB, is as follows:

Lets ignore the last line (Swap - it simply shows the total swap space and how much from that swap space is allocated and how much is still free) and focus on physical memory:

The first three numbers in the Mem: line are straight forward: the "total" column shows the total physical memory available (most likely, this system has 8 GiB installed and uses a part of it for its graphics device, hence the "total" column shows less than 8 GiB). The "used" column shows the amount of memory which is currently in use, and the "free" column shows the amount which is still available.

Then, there are the "buffers" and "cached" columns - they show how much from the "used" memory is really used for buffers and caches. Buffers and caches is memory which the kernel uses for temporary data - if an application requires more memory, and there is no memory "free" anymore, the kernel can still use this temporary memory and assign it to application processes (probably resulting in lower I/O performance since there is not as much cache memory available now).

Finally, there is the "+/- buffers/cache" line: This might look strange first, but what it does is that it also reports the "used" and "free" memory, without the buffers and caches - as said above, buffer and cache memory is dynamic and can be assigned to an application process, if required. Hence, the "+/- buffers/cache" line actually shows the memory which is used by and available for processes.

The following diagram shows the memory allocation from the sample above:

top returns almost the same information, in a slightly different layout (note that the numbers are somewhat different since some time has elapsed between the execution of the two commands):

The main difference is that it does not directly show the "used" and "free" memory without the buffers - but this can be easily calculated.

Another thing which looks strange is that the amount of "cached" memory is shown in the "Swap" line - however, it has nothing to do with swap, probably it has been put there to use the available screen area as efficient as possible.

Update: procps >= 3.3.10

Starting with procps 3.3.10, the output of free has changed which might cause some confusion. I came across this through a question on StackOverflow: Linux "free -m": Total, used and free memory values don't add up. Essentially, free does not show the "+/- buffers/cache" line anymore, but instead shows an "available" column which is taken from the MemAvailable metric which has been introduced with kernel 3.14. See https://www.kernel.org/doc/Documentation/filesystems/proc.txt for a complete description:

MemAvailable: An estimate of how much memory is available for starting new applications, without swapping. Calculated from MemFree, SReclaimable, the size of the file LRU lists, and the low watermarks in each zone. The estimate takes into account that the system needs some page cache to function well, and that not all reclaimable slab will be reclaimable, due to items being in use. The impact of those factors will vary from system to system.

The new format of the free output looks like this:

The main difference is that the "buff/cache" values are not part of "Used" anymore, but counted separately. Hence, the total memory is calculated as "used + buff/cache + free":

Since the "available" value is an estimation which considers some system specific factors, it can not directly be calculated from the other values which are shown by free.


Tracing shared library calls with ltrace and latrace

Posted on Tuesday, May 05 2015 at 13:13 | Category: C, Linux | 3 Comment(s)

Tracing shared library calls with ltrace

Similar to strace, which can be used to trace system calls, ltrace can be used to trace shared library calls. This can be very useful to get insight into the program flow of applications while analyzing an issue.

Consider the following simple sample:

hello.c:

include <stdio.h>

void helloFunc(const char* s) {
   printf(s);
}

hello.h:


void helloFunc(const char* s);

helloMain.c:

#include "hello.h"

int main() {
   helloFunc("Hello World, how are you?");
   return 0;
}

We create a shared library from hello.c:

$ gcc -fPIC --shared -o libhello.so hello.c
And we create the executable from helloMain.c which links against the library created before:
$ gcc -o hello helloMain.c -lhello -L.

We can now use ltrace to see which library calls are executed:

$ export LD_LIBRARY_PATH=.
$ ltrace ./hello 
__libc_start_main(0x40069d, 1, 0x7fffe2f3b778, 0x4006c0 <unfinished ...>
helloFunc(0x400744, 0x7fffe2f3b778, 0x7fffe2f3b788, 0)              = 25
Hello World, how are you?+++ exited (status 0) +++

One drawback with ltrace is that it only traces calls from the executable to the libraries the executable is linked against - it does not trace calls between libraries! Hence, the call to the printf() function (which itself resides in the libc shared library) is not shown in the output. Also, there is no option to include the library name in the output for each of the called functions.

Tracing shared library calls with latrace

Especially for larger applications, a better alternative to ltrace is latrace which uses the LD_AUDIT feature from the libc library (available from libc 2.4 onward). On Ubuntu, it can be installed with

$ sudo apt-get install latrace
When using latrace with the sample program from above, there are two important differences:
  • First, latrace also traces library calls between shared libraries, so the output includes the printf call executed from our own shared library.
  • Second, the shared library name which contains the symbol is printed after each function call:
$ latrace ./hello
10180     _dl_find_dso_for_object [/lib64/ld-linux-x86-64.so.2]  
10180     __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]  
10180       helloFunc [./libhello.so]  
10180         printf [/lib/x86_64-linux-gnu/libc.so.6]  
10180       __tls_get_addr [/lib64/ld-linux-x86-64.so.2]  
10180       __cxa_finalize [/lib/x86_64-linux-gnu/libc.so.6]  
Hello World, how are you?
./hello finished - exited, status=0

By default, function parameters are not shown, but this can be enabled with the -A option:

$ latrace -A ./hello
10190     _dl_find_dso_for_object [/lib64/ld-linux-x86-64.so.2]  
10190     __libc_start_main(main = 0x40069d, argc = 1, ubp_av = 0x7fffccd35248, auxvec = 0x4006c0, init = 0x400730, fini = 0x7f3d16b21560, rtld_fini = 0x7fffccd35238) [/lib/x86_64-linux-gnu/libc.so.6] {
10190       helloFunc [./libhello.so]  
10190         printf(format = "Hello World, how are you?") [/lib/x86_64-linux-gnu/libc.so.6] {
Hello World, how are you?10190         } printf = 25
10190       __tls_get_addr [/lib64/ld-linux-x86-64.so.2]  
10190       __cxa_finalize(ptr = 0x7f3d1650b030) [/lib/x86_64-linux-gnu/libc.so.6] {
10190       } __cxa_finalize = void

With -A, ltrace uses some configuration files at /etc/latrace.d to define the parameter format for a set of well-known functions such as printf. We can see in the output that, even though helloFunc() takes a parameter, this parameter is not shown in the output (since it is not defined in the configuration files).

We can use the -a option to specify our own argument definition file. An argument definition file is essentially a header file which defines the prototypes for all functions for which the arguments should be displayed in the output. On Ubuntu, the default argument definition files are stored at /etc/latrace.d/headers/, and there is a master file /etc/latrace.d/headers/latrace.h which includes the other header files. We can use the same approach in our own definition file, by first including the master file from /etc/latrace.d/headers/latrace.h and then add the prototypes for each function we want to trace. For our ssample above, the file could look like

#include "/etc/latrace.d/headers/latrace.h"
void helloFunc(const char* s);

Assumed this file is called mylatrace.h, wen can now use the -a option to pass the file to latrace:

$ latrace -a mylatrace.h -A ./hello
10801     _dl_find_dso_for_object [/lib64/ld-linux-x86-64.so.2]  
10801     __libc_start_main(main = 0x40069d, argc = 1, ubp_av = 0x7fff4b89fb58, auxvec = 0x4006c0, init = 0x400730, fini = 0x7f1927ea1560, rtld_fini = 0x7fff4b89fb48) [/lib/x86_64-linux-gnu/libc.so.6] {
10801       helloFunc(s = "Hello World, how are you?") [./libhello.so] {
10801         printf(format = "Hello World, how are you?") [/lib/x86_64-linux-gnu/libc.so.6] {
10801         } printf = 25
10801       } helloFunc = void
10801       __tls_get_addr [/lib64/ld-linux-x86-64.so.2]  
Hello World, how are you?10801       __cxa_finalize(ptr = 0x7f192788b030) [/lib/x86_64-linux-gnu/libc.so.6] {
10801       } __cxa_finalize = void
./hello finished - exited, status=0

As you can see in the output, we now also see that helloFunc() is called with one parameter s which is set to "Hello World, how are you?".


Defining a custom core file handler

Posted on Tuesday, April 14 2015 at 13:09 | Category: Linux, C | 0 Comment(s)

I recently was wondering how apport can intercept core files written by the Linux kernel:

Essentially, there is a kernel interface which allows to execute arbitrary commands whenever the kernel generates a core file. Earlier, this was used to fine tune the filename of the core file, like adding a time stamp or the user id of the process which generated the core file, instead of just plain core. The file name pattern can be defined through a special file located at /proc/sys/kernel/core_pattern.

Since kernel 2.6.19, /proc/sys/kernel/core_pattern also supports a pipe mechanism. This allows to send the whole core file to stdin of an arbitrary program which can then further handle the core file generation. Additional parameters like the process id can be passed to the command line arguments of the program by using percent specifiers. On Ubuntu, by default /proc/sys/kernel/core_pattern contains the following string:

|/usr/share/apport/apport %p %s %c %P
This means to send the core file to stdin of /usr/share/apport/apport, and pass additional parameters like the process id to the command line parameters. See http://man7.org/linux/man-pages/man5/core.5.html for more information about the supported % specifiers.

Example: automatically launching a debugger

It is also possible to execute a shell script, which makes it very easy to execute specific actions whenever a core file is generated. Lets assume we want to launch the gdb debugger each time a core file is created, load the crashed program together with the core file and automatically show the call stack where the program crashed. This can be achieved by the following script:
#!/bin/bash

# Get parameters passed from the kernel
EXE=`echo $1 | sed -e "s,!,/,g"`
EXEWD=`dirname ${EXE}`
TSTAMP=$8

# Read core file from stdin
COREFILE=/tmp/core_${TSTAMP}
cat > ${COREFILE}

# Launch xterm with debugger session
xterm -display :1 -e "gdb ${EXE} -c ${COREFILE} -ex \"where\"" &

Now, all we need to do is to register the script in /proc/sys/kernel/core_pattern (we need to do this as root, of course). Assumed that the script is stored as /tmp/handler.sh, we can use the following command to have the kernel execute it whenever a core file is to be written:

# echo '|/tmp/handler.sh %E %p %s %c %P %u %g %t %h %e' > /proc/sys/kernel/core_pattern
For the script above, we would only need the %E and %t specifiers, but by passing all available parameters we can adjust the script without the need to modify /proc/sys/kernel/core_pattern when additional parameters are required. From now on, whenever a core dump is generated, an xterm window will open, gdb will be launched, the crashed file together with the core dump will be loaded into the debugger and the where command will be executed to show the call stack up to the location where the program crashed. The following screenshot shows the execution of the stack smashing sample I wrote about earlier.
Note: the xterm and all programs within it will be run as root user, so be careful with what you do inside the xterm!


Displaying results 1 to 4 out of 14
<< First < Previous 1-4 5-8 9-12 13-14 Next > Last >>