Adam Leventhal's blog

Month: February 2012

A few months ago I took DTrace on OEL for a spin after Oracle announced it. The results were ugly; as one of the authors of DTrace, I admit to being shocked by shoddiness of the effort. Yesterday, Oracle dropped an updated beta so I wanted to see how far they’ve come in the 4+ months since that initial false start.

Whither the probes?

Back in October there were 574 functional probes (and 13 more that didn’t work). Here’s the quantitative state of DTrace for OEL today:

[root@screven drivers]# dtrace -l | wc -l

Okay. Steady improvement. By way of unfair comparison, here’s what it looks like on my Mac OS X laptop:

qadi /Users/ahl # dtrace -l | wc -l

What’s new?

Back in October, I tried enabling all system call probes (i.e. all functional probes); the result was that ssh started failing mysteriously. It was a gross violation of the core principles — it would be unacceptable for DTrace to cause harm to the production systems on which it operates. Good to see that Oracle fixed it.

Previously, profile provider probes weren’t working. The profile probes have been removed — you can’t do arbitrary resolution timer-based profiling — but the simple, tick probes are there:

[root@screven drivers]# dtrace -l -n profile:::
ID   PROVIDER            MODULE                          FUNCTION NAME
612    profile                                                     tick-1
613    profile                                                     tick-10
614    profile                                                     tick-100
615    profile                                                     tick-500
616    profile                                                     tick-1000
617    profile                                                     tick-5000

… and seem to work:

[root@screven ~]# dtrace -n 'tick-1{ printf("%Y", walltimestamp); }'
dtrace: description 'tick-1' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    612                          :tick-1 2012 Feb 23 04:31:27
  0    612                          :tick-1 2012 Feb 23 04:31:28
  0    612                          :tick-1 2012 Feb 23 04:31:29

They’ve also added some inscrutable SDT (statically defined tracing) probes:

[root@screven ~]# dtrace -l -n sdt:::
   ID   PROVIDER            MODULE                          FUNCTION NAME
  597        sdt           vmlinux                    __handle_sysrq -handle_sysrq
  601        sdt           vmlinux                  oom_kill_process oom_kill_process
  602        sdt           vmlinux                   check_hung_task check_hung_task
  603        sdt           vmlinux                   sys_init_module init_module
  604        sdt           vmlinux                 sys_delete_module delete_module
  611        sdt           vmlinux                      signal_fault signal_fault

More usefully, the beta includes a partially implemented proc provider; the proc provider traces high level process activity (check the docs).

[root@screven ~]# dtrace -l -n proc:::
   ID   PROVIDER            MODULE                          FUNCTION NAME
  598       proc           vmlinux                  do_execve_common exec-success
  599       proc           vmlinux                  do_execve_common exec-failure
  600       proc           vmlinux                  do_execve_common exec
  605       proc           vmlinux             get_signal_to_deliver signal-handle
  606       proc           vmlinux                     __send_signal signal-send
  607       proc           vmlinux                           do_exit exit
  608       proc           vmlinux                           do_exit lwp-exit
  609       proc           vmlinux                           do_fork create
  610       proc           vmlinux                           do_fork lwp-create

For reference, here’s what it looks like on DelphixOS, an illumos derivative (which of course includes DTrace):

root@argos:~# dtrace -l -n proc:::
   ID   PROVIDER            MODULE                          FUNCTION NAME
10589       proc              unix                   lwp_rtt_initial lwp-start
10629       proc              unix                   lwp_rtt_initial start
10631       proc              unix                              trap fault
10761       proc           genunix                      sigtimedwait signal-clear
10762       proc           genunix                              psig signal-handle
10763       proc           genunix                         sigtoproc signal-discard
10764       proc           genunix                         sigtoproc signal-send
10831       proc           genunix                        lwp_create lwp-create
10868       proc           genunix                             cfork create
10870       proc           genunix                         proc_exit exit
10871       proc           genunix                          lwp_exit lwp-exit
10872       proc           genunix                         proc_exit lwp-exit
10873       proc           genunix                       exec_common exec-success
10874       proc           genunix                       exec_common exec-failure
10875       proc           genunix                       exec_common exec

Each DTrace probe has arguments that convey information about the activity that caused the probe to fire. For example, with the kernel function boundary tracing (fbt) provider (not yet implemented in OEL), the arguments for the function entry probe correspond to the arguments passed to the function. With static providers such as the proc provider, the parameters include useful information… but I can never seem to remember the types and order. Fortunately, DTrace lets you add in the -v option to get more information about a probe. Unfortunately, this hasn’t been hooked up in Oracle’s port (just an bug, I’m sure):

[root@screven ~]# dtrace -lv -n proc:::signal-send
   ID   PROVIDER            MODULE                          FUNCTION NAME
  606       proc           vmlinux                     __send_signal signal-send

	Probe Description Attributes
		Identifier Names: Private
		Data Semantics:   Private
		Dependency Class: Unknown

	Argument Attributes
		Identifier Names: Evolving
		Data Semantics:   Evolving
		Dependency Class: ISA

	Argument Types
		args[0]: (unknown)
		args[1]: (unknown)
		args[2]: (unknown)
		args[3]: (unknown)
		args[4]: (unknown)
		args[5]: (unknown)
		args[6]: (unknown)
		args[7]: (unknown)
		args[8]: (unknown)
		args[9]: (unknown)
		args[10]: (unknown)
		args[11]: (unknown)
		args[12]: (unknown)
		args[13]: (unknown)
		args[14]: (unknown)
		args[15]: (unknown)
		args[16]: (unknown)
		args[17]: (unknown)
		args[18]: (unknown)
		args[19]: (unknown)
		args[20]: (unknown)
		args[21]: (unknown)
		args[22]: (unknown)
		args[23]: (unknown)
		args[24]: (unknown)
		args[25]: (unknown)
		args[26]: (unknown)
		args[27]: (unknown)
		args[28]: (unknown)
		args[29]: (unknown)
		args[30]: (unknown)
		args[31]: (unknown)

Here’s what it looks like on DelphixOS:

root@argos:~# dtrace -lv -n proc:::signal-send
   ID   PROVIDER            MODULE                          FUNCTION NAME
10764       proc           genunix                         sigtoproc signal-send

        Probe Description Attributes
                Identifier Names: Private
                Data Semantics:   Private
                Dependency Class: Unknown

        Argument Attributes
                Identifier Names: Evolving
                Data Semantics:   Evolving
                Dependency Class: ISA

        Argument Types
                args[0]: lwpsinfo_t *
                args[1]: psinfo_t *
                args[2]: int

Even without the type system being hooked up, you can definitely do some useful work with this beta. For example, I can use the proc provider to look at what commands are executing on my system:

[root@screven ~]# dtrace -n proc:::exec'{ trace(stringof(arg0)); }'
dtrace: description 'proc:::exec' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    600            do_execve_common:exec   /usr/bin/staprun
  0    600            do_execve_common:exec   /usr/sbin/perf
  0    600            do_execve_common:exec   /bin/uname
  0    600            do_execve_common:exec   /usr/libexec/perf.2.6.39-101.0.1.el6uek.x86_64

On his blog, Wim Coekaerts showed some examples of use of the proc provider that included this common idiom:

        this->pid = *((int *)arg0 + 171);

It’s hard to know where that 171 constant came from or how a user would figure that out. I assume that this is because OEL doesn’t yet have proper types and it’s a hardcoded offset into some structure. Here’s what that would look like on completed DTrace implementations:

        this->pid = args[0]->pr_pid;


There’s a long way to go, but it looks like the folks at Oracle are making progress. It will be interesting to see the source code that goes along with this updated beta — as of this writing, the git repository has not been updated. Personally, I’m eager to see what user-land tracing looks like in the form of the pid provider and USDT. In the tradition of other ports such as Apple’s and FreeBSD’s, I’d invite the Oracle team to present their work at the upcoming DTrace conference, dtrace.conf.

Tonight, my Delphix colleague Zubair Khan and I presented the integration we’ve done with git at the SF Bay Area Large-Scale Production Engineering meetup. When I started at Delphix, we were using Subversion — my ire for which the margins of this blog are too narrow to contain. We switched to git, and in the process I became an unabashed git fanboy.

Git is a powerful tool generally, but in particular has some powerful hook points that we use to enforce our code integration criteria and to do some handy things after we integrate. For this, we wrote some custom bash scripts, and python integrations with Bugzilla and Review Board. You can check out the slides, and we’ve open sourced it all on github with the hope that it might help people with their own integrations.

Back at Fishworks, my colleagues had a nickname for me: Adam Leventhal, Hardware Engineer. I wasn’t designing hardware; I wasn’t even particularly more involved with hardware specs. The name referred to my preternatural ability to fit round pegs into square holes, to know when parts would bend but not break (or if they broke how to clean up the evidence), and when a tight fit necessitated a running start.

I first earned the nickname when we got the prototype hardware for what would eventually becomes the Sun Storage 7410 — part of our initial product line, and our first product to support clustering. When the system arrived, and I tried to install a SAS HBA, I encountered my first hardware bug. In the Solaris kernel group I had hit microprocessor bugs, but this was pretty different: the actual sheet metal was designed for cards to drop in horizontally, and the designers hadn’t considered connectors that protruded from a PCI card’s faceplate.

To solve the problem, I had to (carefully) bend back the retaining metal supports, drop in the card, and then try to bend them back. I think my colleagues were just impressed that I didn’t break anything.

The hardware team took our feedback and designed a different mechanism for inserting PCI cards.

Science Experiments with SSDs

Another task that fell to Adam Leventhal, Hardware Engineer was conducting the science experiments we needed to verify if something was a stupid idea or merely a crazy one. Often this took the form of trying to make something fit somewhere it wasn’t supposed to fit. For example, we often had 2.5″ SSDs that we wanted to stick into 3.5″ drive bays to eliminate as many variables as possible when baking off a 2.5″ SSD versus as 3.5″ one. Here are some examples:

some SSDs in a Thumper (SS7200)
an SSD in a Riverwalk (J4400)

The Ice-Cream Sandwich

Another favorite experiment involving SSDs came when we were first investigating Readzilla candidates. We wanted to get as much capacity as we could in the 2.5″ drive bay. The prototypes of the Intel X25-E were only 7mm high so we speculated that we could make an “ice-cream sandwich” with some sort of chip to present them as a single SATA device. Well, we found such a chip, and so I ran the experiment to see what the hardware would look like to our OS and what the performance characteristics would be.

You can see the two Intel SSDs duct-taped together, and connected to a power supply in the background and the test board on the right. The test board has another SATA cable that snakes into the box and connects where the drive connector is at the back of the drive bay. Yes, it was a huge pain to connect that final cable; not pictured is the duct tape in the drive bay to keep the SATA cable in place.

The thing worked, but the performance was lousy, and we determined that two drives and some sort of interposer might fit, but it would be like sticking a potato up the tailpipe — all airflow would be blocked.

Conjoined Twins

By far my favorite science project was the conjoined twin Iwashis (SS7110). Iwashi was a stand-alone storage box with an internal SAS HBA that connected to a 16-disk backplane. It turned out though that only one of the two SAS connections was needed to see all the disks. Sitting around at lunch one day we had an idea: could we provide high availability for user data by getting a pair of Iwashis and cross-wiring their HBAs to connect to each others’ backplanes. We would then mirror the data (or something) between the two boxes.

Note that that two systems needed to be placed head-to-toe in order to let the cables reach; take note of a few features in the picture above:

  1. The SAS HBA in the right system…
  2. connects up to the right system’s own backplane…
  3. and to the backplane on the left (note that running between the fan trays was the only option)…
  4. which also connects up to the SAS HBA in the left system.

This all required running with the lid off. Those systems contained a magnetic kill switch — if you removed the lid, the power would shut off. This was — wisely — to ensure proper airflow and to avoid overheating. But this interfered with this (and many other) experiments, so I just unscrewed the magnet from the lid and let it connect directly with its main chassis mate.

I could get the lid onto the left system, but I didn’t want the fan tray lid pinching the SAS cable that ran between the two boxes. To this day, I think that propping up the fan tray lid is the best use of those discarded PCI faceplate fillers.

We scrapped this idea for a variety of concerns both mundane (we needed both SAS connectors to drive the LEDs for each drive), and fundamental (it was pretty clearly goofy).

Still Hardware Engineering

At Delphix, we’re selling a virtual appliance so the opportunities for Adam Leventhal, Hardware Engineer to shine are fewer and farther between. But hardware engineering has always been more of a state of mind… and there’s still the occasional opportunity to stab at a jumper with a bread knife from the kitchen to generate an NMI and initiate a kernel panic!

Recent Posts

February 12, 2017
December 18, 2016
August 9, 2016
August 2, 2016