June 2004 – Adam Leventhal's blog

from linux.kernel

June 29, 2004

I noticed the following usenet post the other day:

Hi, I have a fairly busy mailserver which also has a simple iptables
ruleset (blocking some IP’s) running 2.6.7 with the deadline i/o
scheduler. vmstat was reporting that system time was around 80%. I did
the following

readprofile -r ; sleep 240 ; readprofile -n -m /boot/System.map-`uname -r` | sort -rn -k 1,1 | head -22

<snip>

I am trying to determine where the system time is going and don’t have
much zen to begin with. Any assistance would be appreciated ?

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&selm=2ayz2-1Um-15%40gated-at.bofh.it

Seems like a tricky problem, and there were some responses on the thread proposing some theories on the source of the problem and requesting more data:

This doesn’t look like very intense context switching in either case. 2.6.7
appears to be doing less context switching. I don’t see a significant
difference in system time, either.

Could you please send me complete profiles?

and

How many context switches do you get in vmstat?
Most likely you just have far too many of them. readprofile will attribute
most of the cost to finish_task_switch, because that one reenables the
interrupts (and the profiling only works with interrupts on)

Too many context switches are usually caused by user space.

This is exactly the type of problem that DTrace was designed for — my system is slow; why? Rather than just having the output from readprofile, you could find out exactly what applications are being forced off CPU while they still have work to do, or what system calls are accounting for the most time on the box, or whatever. And not only could you get the answers to these questions, you could do so quickly and then move onto the next question or revise your initial hypothesis. Interestingly, someone brought this up:

Hmm, Is there a way to determine which syscall would be the culprit. I
guess this is where something like DTrace would be invaluable

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&selm=2azXY-2MT-27%40gated-at.bofh.it

Which got this reply:

Sounds like a inferior clone of dprobes to me. But I doubt it
would help tracking this down.

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&safe=off&selm=2aAKq-3jW-19%40gated-at.bofh.it

For the uninitiated, DProbes is a linux kernel patch that provides some dynamic tracing facilities. It fails to meet a number of the requirements of DTrace in that it’s not safe to use on production systems, it can silently lose data, and it lacks many of the data aggregation and filtering facilities of DTrace, but perhaps most importantly, it’s not in any linux distro by default (our USENIX paper on DTrace has a more complete discussion of DProbes). The frustrating thing about this post is that DTrace would solve this problem and yet this particular member of the linux community is too myopic to recognize innovation when it doesn’t have a penguin on it.

On the other hand, it’s great that DTrace was mentioned, and that someone has noticed that this DTrace thing might actually be the tool they’ve been needing.

DTrace bookkeeping

June 25, 2004

Thanks to a suggestion from Alan Hargreaves, I’ve moved the DTrace Solaris Express schedule here where it will continue to live and be updated.

Life without DTrace

June 24, 2004

As a member of the Solaris Kernel Group, I’ve obviously developed an affinity for using Solaris. There are tools like truss(1) and pstack(1) that come out of my fingers before I know what I’m typing, and now DTrace has taken such a central role in how I develop software, administer boxes, and chase down problems that I can’t imagine doing without it.

Except sometimes I do have to do without it. As much as I love Solaris, I still own a PowerBook G4 from a few years ago which has survived thousands of miles, and dozens of drops. It’s a great laptop, but I have no idea how anyone does any serious work on it. I think DTrace must have made me lazy, but even finding out simple information is incredibly arduous: tonight the VPN application was being cranky so I wanted to look at what was going on in the kernel; how do I do that? When Safari‘s soaking up all my CPU time, what’s it doing? There are some tools I can use, but they’re clunky and much more cumbersome than DTrace.

When I was shopping for a car a few years ago, I looked at one with a very reasonable 190-horsepower engine and pricier one with a 250-horsepower bi-turbo. From the moment I test drove the latter on the freeway, dropped it into 4th, and felt it leap up to 75mph in an instant, I knew I’d never be satisfied with the former. Be warned: you should only test drive DTrace if you think you’ll have the chance to buy it — I doubt you’ll be able to settle for puttering your way up to 65mph again. Luckily, Solaris won’t put such a big ding in your wallet 😉

The next piece of the DTrace puzzle

June 22, 2004

When we first wrote DTrace, we needed to make sure it satisfied our fundamental goals: stable, safe, extensive, available in production, zero probe effect when disabled. By extensive, we meant that every corner of the system had to be covered, from kernel function calls, and kstats through system calls to any instruction in any process.

What we quickly discovered (both from our own use of DTrace and lots of great feedback from the DTrace community) was that while there were certainly enough probes, it often was difficult to find the right probes or required some specific knowledge of one or more Solaris sub-systems. In the kernel we’ve been working on addressing that with the stable providers: proc, sched, io and soon more. These providers present the stable kernel abstractions ways that are well documented, comprehensible, and maintainable from release to release (meaning that your scripts won’t break on Solaris 11).

I’m working on the next logical extension to this idea: stable user-level providers. The idea here is that executables and shared libraries will be able to publish their stable abstractions through stable probes. For example, the first stable user-level provider I plan to add is the plockstat provider. This will provide probes for the user-level synchronization primitives — each time a thread locks a mutex by calling mutex_lock(3c) or pthread_mutex_lock the plockstat:::mutex-acquire probe will fire. Just as the lockstat(1m) command has let Solaris users investigate kernel-level lock contention, the plockstat provider will bring the same investigative lens to user-land. Very exciting.

With user-level stable providers, there’s also an opportunity for application developers to build stable hooks that their customers, or support engineers, or sales engineers, or developers, or whoever can use. Consider databases. Databases, by custom or necessity, seem to have a bunch of knobs to turn, knobs that need experts to turn them properly. Solaris similarly has some knobs that need to be tweaked to get your database to run just so. Now imagine if that database included stable probes for even coarse indicators of what’s going on internally. It could then be possible to build DTrace scripts that enable probes in the database and the kernel to get a truly systemic view of database performance. Rather than requiring a database administrator versed in the oral tradition of database tuning, some of that knowledge could be condensed into these DTrace scripts whose output could be advice on how to turn which knobs.

A quick aside: this example highlighs one of the coolest things about DTrace — it’s systemic scope. There are lots of specialized tools for examining a particular part of the system, but correlating and integrating the data from these tools can be difficult or impossible. With DTrace there’s one consistent data stream and instrumentation from every corner of the system can be easily tied together in coherent ways.

Back to user-level stable probes. A developer will add a new probe by invoking a macro:

void
my_func(my_struct_t *arg)
{
DTRACE_PROBE1(my_provider, my_probe, arg->a);
/* ... */
}

You then build the object file, and, before linking the object, you’ll use dtrace(1m) to post-process all the object files; the Solaris Dynamic Tracing Guide will describe this in excruciating specificity once I work out the details. This will create this probe my_provider<pid>:<object name>my_func:my_probe where <pid> is the process ID of the process that mapped this load object (executable or shared object) and <object name> is the name of that load object.

With user-level stable providers, applications and shared objects will be able to describe their own probes which should lead to simpler administration and support. Later it might be possible to leave in all those debugging printfs as DTrace probes. I’d love to hear about any other interesting ideas for user-level stable providers. Now back to the code…

Warm up the propaganda machine

June 17, 2004

I’m a Solaris Kernel engineer at Sun working on DTrace, the new dynamic instrumentation framework in Solaris 10, along side my coconspirators Bryan Cantrill and Mike Shapiro. In addition to spending a large amount of my time improving DTrace, I have and continue to work on observability and debugging tools in Solaris — mdb(1), the p-tools, /proc file system — stuff like that.

My goal with this weblog is to write to no one in particular about what we have cooking for DTrace in the future. I’m sure I’ll occassionally degenerate into unbridled rants as in the vogue for weblogs, but I’ll try to keep it vaguely interesting…

Most of my work on DTrace has been directed towards tracing user-level applications. My first contribution, over two years ago, was the ustack() action to let you take an application stack backtrace from DTrace. Next I made the pid provider that lets you trace not only any user-level function entry and return, but every single instruction in a function. So you can do stuff like this:

#!/usr/sbin/dtrace -s
pid$1::$2:entry
{
self->spec = speculation();
}
pid$1::$2:
/self->spec/
{
speculate(self->spec);
printf("%s+%s", probefunc, probename);
}
pid$1::$2:return
/self->spec && arg1 == -1/
{
commit(self->spec);
self->spec = 0;
}
pid$1::$2:return
/self->spec/
{
discard(self->spec);
}

Run this with two arguments, the process ID and the function name. What this script does is trace every instruction a function executes if it ends up returning -1. There can be problems where a function works properly 1,000 times and then fails once. Stepping through it with a debugger is brutal if not impossible, but this really simple D script makes understanding the problem a snap.

Anyway, I think every user of Solaris probably has a need for DTrace even if he or she doesn’t know it. I’ve loved working on DTrace, and look forward to sharing some of the future directions and the nitty-gritty, inside-the-sausage-factory stuff.

Adam Leventhal's blog

Month: June 2004

from linux.kernel

DTrace bookkeeping

Life without DTrace

The next piece of the DTrace puzzle

Warm up the propaganda machine

Recent Posts

Austin API Summit Wrap-up

Rust and JSON Schema: odd couple or perfect strangers

Oxide and Friends Season 4

DTrace probes in Rust

From Prometheus to Sisyphus

DTrace at Home

Archives

Archives