 	             Using the Linux Kernel Markers

			    Mathieu Desnoyers


	This document introduces to markers and discusses its purpose. It
shows some usage examples of the Linux Kernel Markers : how to insert markers
within the kernel and how to connect probes to a marker. Finally, it has some
probe module examples. This is what connects to a marker.


* Purpose of markers

A marker placed in your code provides a hook to a function (probe) that
you can provide at runtime. A marker can be on (a probe is connected to it)
or off (no probe is attached). An "off" marker has no effect. When turned on,
the function you provide is called each time the marker is executed in the
execution context of the caller. When the function provided ends its execution,
it returns to the caller (probe site).

You can put markers at important locations in the code. They act as
lightweight hooks that can pass an arbitrary number of parameters,
described in a printk-like format string, to a function whenever the marker
code is reached.

They can be used for tracing (LTTng, LKET over SystemTAP), overall performance
accounting (SystemTAP). They could also be used to implement
efficient hooks for SELinux or any other subsystem that would have this
kind of need.

Using the markers for system audit (SELinux) would require to pass a
variable by address that would be later checked by the marked routine.


* Usage

In order to use the macro MARK, you should include linux/marker.h.

#include <linux/marker.h>

Add, in your code :

MARK(subsystem_event, "%d %s %p[struct task_struct]",
  someint, somestring, current);
Where :
- subsystem_event is an identifier unique to your event
    - subsystem is the name of your subsystem.
    - event is the name of the event to mark.
- "%d %s %p[struct task_struct]" is the formatted string for (printk-style).
- someint is an integer.
- somestring is a char pointer.
- current is a pointer to a struct task_struct.

The expression %p[struct task_struct] is a suggested marker definition
standard that could eventually be used for pointer type checking in
sparse. The brackets contain the type to which the pointer refers.

Connecting a function (probe) to a marker is done by providing a probe
(function to call) for the specific marker through marker_set_probe(). It will
automatically connect the function and enable the marker site. Removing a probe
is done through marker_remove_probe(). Probe removal is preempt safe because
preemption is disabled around the probe call. See the "Probe example" section
below for a sample probe module.

The marker mechanism supports multiple instances of the same marker.
Markers can be put in inline functions, inlined static functions and
unrolled loops.

Note : It is safe to put markers within preempt-safe code : preempt_enable()
will not call the scheduler due to the tests in preempt_schedule().


* Optimization for a given architecture

You will find, in asm-*/marker.h, optimisations for given architectures
(currently i386 and powerpc). They use a load immediate instead of a data read,
which saves a data cache hit, but also requires cross CPU code modification. In
order to support embedded systems which use read-only memory for their code, the
optimization can be disabled through menu options.

The MF_* flags can be used to control the type of marker. See the
include/marker.h header for the list of flags. They can be specified as the
first parameter of the _MARK() macro, such as the following example which is
safe wrt lockdep.c (useful for marking lockdep.c functions).

_MARK(_MF_DEFAULT | ~_MF_LOCKDEP, subsystem_eventb,
  MARK_NOARGS);

Another example is to specify that a specific marker must never call printk :
_MARK(_MF_DEFAULT | ~_MF_PRINTK, subsystem_eventc,
  "%d %s %p[struct task_struct]",
  someint, somestring, current);

Flag compatibility is checked before connecting the probe to the marker.


* Probe example

------------------------------ CUT -------------------------------------
/* probe-example.c
 *
 * Loads a function at a marker call site.
 *
 * (C) Copyright 2007 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
 *
 * This file is released under the GPLv2.
 * See the file COPYING for more details.
 */

#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/marker.h>

#define SUBSYSTEM_EVENT_FORMAT "%d %s %p[struct task_struct]"
void probe_subsystem_event(const char *format, ...)
{
	va_list ap;
	/* Declare args */
	unsigned int value;
	const char *mystr;
	struct task_struct *task;

	/* Assign args */
	va_start(ap, format);
	value = va_arg(ap, typeof(value));
	mystr = va_arg(ap, typeof(mystr));
	task = va_arg(ap, typeof(task));

	/* Call tracer */
	trace_subsystem_event(value, mystr, task);

	/* Or call printk */
	vprintk(format, ap);

	/* or count, check rights... */

	va_end(ap);
}

#define SUBSYSTEM_EVENTB_FORMAT MARK_NOARGS
void probe_subsystem_eventb(const char *format, ...)
{
	/* Increment counters, trace, ... but _never_ generate a call to
	 * lockdep.c ! */
}

static int __init probe_init(void)
{
	int result;
	result = marker_set_probe("subsystem_event",
			SUBSYSTEM_EVENT_FORMAT,
			probe_subsystem_event);
	if (!result)
		goto cleanup;
	result = _marker_set_probe(_MF_DEFAULT & ~_MF_LOCKDEP,
			"subsystem_eventb",
			SUBSYSTEM_EVENTB_FORMAT,
			probe_subsystem_eventb);
	if (!result)
		goto cleanup;
	return 0;

cleanup:
	marker_remove_probe(probe_subsystem_event);
	marker_remove_probe(probe_subsystem_eventb);
	return -EPERM;
}

static void __exit probe_fini(void)
{
	marker_remove_probe(probe_subsystem_event);
	marker_remove_probe(probe_subsystem_eventb);
}

module_init(probe_init);
module_exit(probe_fini);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Mathieu Desnoyers");
MODULE_DESCRIPTION("SUBSYSTEM Probe");
------------------------------ CUT -------------------------------------

