October 2004 – Adam Leventhal's blog

DTrace time

October 29, 2004

The other day I spent some time with a customer tuning their app using DTrace. I hadn’t done much work on apps outside of Sun, so I was excited to see what I could do with DTrace other people’s code. I’m certainly not the first person to use DTrace on a real-world app (I’m probably the 10,000th), so I had some expectations to live up to.

The app basically processes transactions: message comes in, the app does some work and sends out another message. There’s a lot of computation for each message, but also a lot of bookkeeping and busy work receiving, processing and generating the messages.

When sprintf(3C) attacks…

One of the first things we looked at was what functions the app calls the most. Nothing fancy; just a DTrace one-liner:

# dtrace -n pid123:::entry'{ @[probefunc] = count(); }'

Most of it made sense, but one thing we noticed was that sprintf(3C) was getting called a ton, so the question was “what are those sprintfs doing?” Another one-liner had the answer:

# dtrace -n pid123::sprintf:entry'{ @[copyinstr(arg0)] = count(); }'

There were about four different format strings being used, two of them farily complex and then these two: “%ld” and “%f”. In about 5 seconds, the app was calling sprintf(“%ld”, ) several thousand times. It turns out that each transactions contains an identifier — represented by a decimal integer — so this wasn’t unexpected per se, but we speculated that using lltostr(3C) or hand rolling a int->string function might yield better results.

I had just assumed that lltostr(3C) would perform better — it’s a specialized function that doesn’t involve all the (extensive and messy) machinery of sprintf(3C). I wrote a little microbenchmark that just converted the number 1000000 to a string with both functions a million times and ran it on an x86 machine; the results were surprising:

$ ./test
sprintf(3C) took 272512920ns
lltostr(3C) took 523507925ns

What? I checked my test, made sure I was compiling everything properly, had called the function once before I started timing (to avoid any first-call overhead from the dynamic linker), but the results were dead repeatable: lltostr(3C) was about half as fast. I looked at the implementation and while I can’t post it here (yet — I can’t wait for OpenSolaris), suffice it to say that it did the obvious thing. The strange thing was that sprintf(3C) had basically the same alogorithm. Just for kicks, I decided to build it amd64 native and run it on the same opteron box; here were the results:

sprintf(3C) took 140706282ns
lltostr(3C) took 38804963ns

Ah much better (I love opteron). It turns out the problem was that we were doing 64-bit math in 32-bit mode — hence ell-ell-to-str — and that is slooooooow. Luckily the app we were looking at was compiled 64-bit native so it wouldn’t have had this problem, but there are still plenty of 32-bit apps out there that shouldn’t have to pay the 64-bit math tax in this case. I made a new version of lltostr(3C) that checks the top 32-bits of the 64-bit input value and does 32-bit math if those bits are clear. Here’s how that performed (on 32-bit x86):

sprintf(3C) took 251953795ns
lltostr(3C) took 459720586ns
new lltostr took  32907444ns

Much better. For random numbers between 2³² and 2^⁶³-1 the change hurt performance by about 1-2%, but it’s probably worth the hit for the 1300% improvement with smaller numbers.

In any case, that was just a long way of saying that for those of you using lltostr(3C) or sprintf(%ld), there are wins to be had.

Timing transactions

Our first use of DTrace was really to discover things that the developers of that app didn’t know about. This sprintf(3c) stuff was one issues, and there were a couple of others, but, on the whole, the app worked as intended. And that’s actually no small feat — many many programs are doing lots of work that wasn’t really intended by the developers or spend their CPU time in places that are completely surprising to the folks who wrote it. The next thing the customers wanted to find was the source of certain latency bubbles. So we used DTrace to time each transaction as it went through multiple processes on the system. While each step was fairly simple, the sum total was by far the largest D script I had written, and in the end we were able to record the time a transaction spent over each leg of the trip through the system.

This is something that could have been done without DTrace, but it would have involved modifying the app, and streaming the data out to some huge file to be post-processed later. Not only is that a huge pain in the ass, it can also have a huge performance impact and is inflexible in terms of the data gathered. With DTrace, the performance impact can be mitigated by instrumenting less, and you can gather arbitrary data so when you get the answer to your first question, you don’t have to rebuild and rerun the app to dive deeper to your next question.

I had been telling people for a while about the virtues on real-world applications, I was happy to see first hand that they were all true, and — perhaps more importantly — I convinced the customer.

breaking with tradition

October 20, 2004

In this weblog, I’ve tried to stick to the facts, talk about the things I know about DTrace, Solaris and the industry, and not stray into the excrutiating minutia of the rest of my life. But:

The Red Sox Are Going To The World Series!!

Post World Series Update: It has been an amazing and unexpected elation I’ve carried with me this past week since the Sox won their first world series in my and my father’s lifetimes. My grandfather was born in 1919 — we would have loved to have seen this. Some of my earliest memories are of watching the Red Sox on my dad’s lap and the 1986 world series is the only time I can remember him shedding a tear. When the Sox were down 3-0 to the hated Yankees in the ALCS, I was crushed. I pledged not to watch games 4 and 5 because I was so emotionally invested in the game that watching them lose would be too painful and watching them win would just be a reminder of the impossibly high mountain they still had left to climb. But they I didn’t live up to my pledge and Ortiz won games 4 and 5 in dramatic fashion (with some help from Dave Roberts). When Lowe pitched a brilliant game 7 to seal their historic come back, I was amazing and delighted (and that’s to say nothing of the heroic efforts of Shilling and Pedro), but nothing compared to that moment with Foulke underhanded Renteria’s gounder to Doug Mientkiewicz and the entirety of Red Sox nation began the celebration.

DTrace case study and contest

October 18, 2004

Jay Danielson has posted a great case study of using DTrace to solve a problem porting a linux driver to Solaris. It’s not an incredibly complicated example, but I find this kind of tracking a problem from symptoms to root cause to be very helpful at understanding how to use DTrace.

Speaking of using DTrace on sun.com, Sun is sponsoring a contest to come up with the most innovative or creative use of DTrace (or Zones). If you’re sitting on some D program that helped you out when no other tool could, or if you found some big performance win, send those in. Or if you want a new HDTV, Opteron laptop or iPod, get cracking (I’m tempted to invent a pseudonym with those kind of goodies on the line). My bet for a sure fire way to finish in the money would be to use DTrace in concert with some GUI or software visualization to get some wholly new look at software, or to find some big performance win in an app that everyone thought had been tuned to perfection with existing tools.

more on gcore

October 13, 2004

Trawling through b.s.c I noticed Fintan Ryan talking about gcore(1), and I realized that I hadn’t sufficently promoted this cool utility. As part of my work adding variable core file content, I rewote gcore from scratch (it used to be a real pile) to add a few new features and to make it use libproc (i.e. make it slightly less of a pile).

You use gcore to take a core dump of a live running process without actually causing the process to crash. It’s not completely uninvasive because gcore stops the process you’re taking the core of to ensure a consistent snapshot, but unless the process is huge or it’s really cranky about timing the perturbation isn’t noticeable. There are a lot of places where taking a snapshot with gcore is plenty useful. Let’s say a process is behaving strangely, but you can’t attach a debugger because you don’t want to take down the service, or you want to have a core file to send to someone who can debug it when you yourself can’t — gcore is perfect. I use to it to take cores of mozilla when it’s chugging away on the processor, but not making any visible progress.

I mentioned that big processes can take a while to gcore — not surprising because we have to dump that whole image out to disk. One of the cool uses of variable core file content is the ability to take faster core dumps by only dumping the sections you care about. Let’s say there’s some big ISM segment or a big shared memory segment: exclude it and gcore will go faster:

hedge /home/ahl -> gcore -c default-ism 256755
gcore: core.256755 dumped

Pretty handy, but the coolest I’ve been making of gcore lately is by mixing it with DTrace and the new(ish) system() action. This script snapshots my process once every ten seconds and names the files according to the time they were produced:

# cat gcore.d
#pragma D option destructive
#pragma D option quiet
tick-10s
{
doit = 1;
}
syscall:::
/doit && pid == $1/
{
stop();
system("gcore -o core.%%t %d", pid);
system("prun %d", pid);
doit = 0;
}
# dtrace -s gcore.d  256755
gcore: core.1097724567.256755 dumped
gcore: core.1097724577.256755 dumped
gcore: core.1097724600.256755 dumped
^C

WARNING! When you specify destructive in DTrace, it means destructive. The system() and stop() actions can be absolutely brutal (I’ve rendered at least one machine unusable my indelicate use of that Ramirez-Ortiz-ian one-two combo. That said, if you screw something up, you can break into the debugger and set dtrace_destructive_disallow to 1.

OK, so be careful, but that script can give you some pretty neat results. Maybe you have some application that seems to be taking a turn for the worse around 2 a.m. — put together a DTrace script that detects the problem and use gcore to take a snapshot so you can figure out what was going on when to get to the office in the morning. Take a couple of snapshots to see how things are changing. You do like debugging from core dumps, right?

back from the DTrace road show

October 5, 2004

A number of factors have conspired to keep me away from blogging, not the least of which being that I’ve been on a coast-to-coast DTrace road show. Now that I’m back, I’ve got some news to report from the road.

Step right up!

At times it felt a bit like a medicine road show: “Step right up to see the amazing DTraaaaace! The demystifying optimizing tantalizing reenergizing tracing frameworrrrrrrk!” I stopped in the Midwest, and D.C. (Eric helped during that leg as my fellow huckster) then I went back to San Francisco and then back accross the country to New York for the big Wall Street to do.

I admit that it got a little repetitive — at one point I ran into A/V problems and was able to run through the presentation from memory — but the people who I talked to, and their questions kept it interesting.

Can DTrace do X?

I was impressed by the number of people who had not only heard of DTrace, but who had already started playing around with it. It used to be that the questions were all of the form “Can DTrace do X?” On this trip, more and more I was asked about specific things people were trying to accomplish with DTrace and told about problems they’d found using DTrace. I thought I’d repeat some of the best questions and insights from the trip:

I’m sure this won’t come as a huge shock to anyone who’s tried to heft a printed copy of the DTrace docs, but some people were a little daunted by the size and complexity of DTrace. To address that, I’d call on an analogy to perl — no one picks up the O’Reilly books on perl, reads them cover to cover and declares mastery of perl. The way most people learn perl is by calling on the knowledge they have from similar languages, finding examples that make sense and then modifying them. When they need to do something just beyond their grasp, they go back to the documentation and find the nugget they need. We’ve tried to follow a similar model with DTrace — find an example that makes sense and work from there; call on your knowledge of perl, java, c, or whatever, and see how it can apply to D. We’ve tried to design DTrace so things pretty much just work as you (a programmer, sysadmin, whatever) would expect them to, and so that a little time invested with the documentation goes a long way. Seeing too much data? Predicates are easy. Sill not getting just the data you want? Spend fifteen minutes to wrap your head around speculations.

And speaking of perl, a lot of people asked about DTrace’s visibility into perl. Right now the only non-natively executed languate DTrace lets you observe is java, but now that we realize how much need there is for visibility into perl, we’re going to be working aggressively on making DTrace work well with perl. We’ve got some neat ideas, but if there are things you’d like to see with DTrace and perl, we’d love to hear about it as we think about what sorts of problems we need to solve.

A lot of people in industry use tools like Intel’s VTune and IBM’s Purify and Quantify, so I got a lot of questions about how DTrace compares to those tools. Which led to the inevitable question of “Where’s the GUI?” First, DTrace by definition can do more than those tools even discounting its systemic scope simply by the ability users have to customize their tracing with D. VTune, Purify, Quantify and other tools present a fairly static view, and I’m sure that users of those tools have always had just one more question, one next step that those tools weren’t solve. Because DTrace doesn’t present a canned, static view, it’s not so clear on what kind of GUI you’d want. Clearly, it’s not just a pull down menu with 40,000 probes to choose from, so we’re actively working on ways to engage the eyes and the visual cortex, but without strapping DTrace into a static framework, bounded by the same constraints of those traditional tools.

Back

Whew. It was a good trip though a bit exhausting. I think I convinced a bunch of people about the utility of DTrace in general, but also to their specific problems. But I also learned about some of DTrace’s shortcomings which we are now working to address. It’s good to be back to coding — I’m putting the finishing touches on DTrace’s AMD64 support which has been a lot of fun. In the next few weeks I’ll be writing about the work going on in the kernel group as we put the final coat of polish on Solaris 10 as it gets ready for its release.

Adam Leventhal's blog

Month: October 2004

DTrace time

When sprintf(3C) attacks…

Timing transactions

breaking with tradition

The Red Sox Are Going To The World Series!!

DTrace case study and contest

more on gcore

back from the DTrace road show

Step right up!

Can DTrace do X?

Back

Recent Posts

Austin API Summit Wrap-up

Rust and JSON Schema: odd couple or perfect strangers

Oxide and Friends Season 4

DTrace probes in Rust

From Prometheus to Sisyphus

DTrace at Home

Archives

Archives