Search
Close this search box.

Tag: Oracle

A few months ago I took DTrace on OEL for a spin after Oracle announced it. The results were ugly; as one of the authors of DTrace, I admit to being shocked by shoddiness of the effort. Yesterday, Oracle dropped an updated beta so I wanted to see how far they’ve come in the 4+ months since that initial false start.

Whither the probes?

Back in October there were 574 functional probes (and 13 more that didn’t work). Here’s the quantitative state of DTrace for OEL today:

[root@screven drivers]# dtrace -l | wc -l
618

Okay. Steady improvement. By way of unfair comparison, here’s what it looks like on my Mac OS X laptop:

qadi /Users/ahl # dtrace -l | wc -l
  578044

What’s new?

Back in October, I tried enabling all system call probes (i.e. all functional probes); the result was that ssh started failing mysteriously. It was a gross violation of the core principles — it would be unacceptable for DTrace to cause harm to the production systems on which it operates. Good to see that Oracle fixed it.

Previously, profile provider probes weren’t working. The profile probes have been removed — you can’t do arbitrary resolution timer-based profiling — but the simple, tick probes are there:

[root@screven drivers]# dtrace -l -n profile:::
ID   PROVIDER            MODULE                          FUNCTION NAME
612    profile                                                     tick-1
613    profile                                                     tick-10
614    profile                                                     tick-100
615    profile                                                     tick-500
616    profile                                                     tick-1000
617    profile                                                     tick-5000

… and seem to work:

[root@screven ~]# dtrace -n 'tick-1{ printf("%Y", walltimestamp); }'
dtrace: description 'tick-1' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    612                          :tick-1 2012 Feb 23 04:31:27
  0    612                          :tick-1 2012 Feb 23 04:31:28
  0    612                          :tick-1 2012 Feb 23 04:31:29

They’ve also added some inscrutable SDT (statically defined tracing) probes:

[root@screven ~]# dtrace -l -n sdt:::
   ID   PROVIDER            MODULE                          FUNCTION NAME
  597        sdt           vmlinux                    __handle_sysrq -handle_sysrq
  601        sdt           vmlinux                  oom_kill_process oom_kill_process
  602        sdt           vmlinux                   check_hung_task check_hung_task
  603        sdt           vmlinux                   sys_init_module init_module
  604        sdt           vmlinux                 sys_delete_module delete_module
  611        sdt           vmlinux                      signal_fault signal_fault

More usefully, the beta includes a partially implemented proc provider; the proc provider traces high level process activity (check the docs).

[root@screven ~]# dtrace -l -n proc:::
   ID   PROVIDER            MODULE                          FUNCTION NAME
  598       proc           vmlinux                  do_execve_common exec-success
  599       proc           vmlinux                  do_execve_common exec-failure
  600       proc           vmlinux                  do_execve_common exec
  605       proc           vmlinux             get_signal_to_deliver signal-handle
  606       proc           vmlinux                     __send_signal signal-send
  607       proc           vmlinux                           do_exit exit
  608       proc           vmlinux                           do_exit lwp-exit
  609       proc           vmlinux                           do_fork create
  610       proc           vmlinux                           do_fork lwp-create

For reference, here’s what it looks like on DelphixOS, an illumos derivative (which of course includes DTrace):

root@argos:~# dtrace -l -n proc:::
   ID   PROVIDER            MODULE                          FUNCTION NAME
10589       proc              unix                   lwp_rtt_initial lwp-start
10629       proc              unix                   lwp_rtt_initial start
10631       proc              unix                              trap fault
10761       proc           genunix                      sigtimedwait signal-clear
10762       proc           genunix                              psig signal-handle
10763       proc           genunix                         sigtoproc signal-discard
10764       proc           genunix                         sigtoproc signal-send
10831       proc           genunix                        lwp_create lwp-create
10868       proc           genunix                             cfork create
10870       proc           genunix                         proc_exit exit
10871       proc           genunix                          lwp_exit lwp-exit
10872       proc           genunix                         proc_exit lwp-exit
10873       proc           genunix                       exec_common exec-success
10874       proc           genunix                       exec_common exec-failure
10875       proc           genunix                       exec_common exec

Each DTrace probe has arguments that convey information about the activity that caused the probe to fire. For example, with the kernel function boundary tracing (fbt) provider (not yet implemented in OEL), the arguments for the function entry probe correspond to the arguments passed to the function. With static providers such as the proc provider, the parameters include useful information… but I can never seem to remember the types and order. Fortunately, DTrace lets you add in the -v option to get more information about a probe. Unfortunately, this hasn’t been hooked up in Oracle’s port (just an bug, I’m sure):

[root@screven ~]# dtrace -lv -n proc:::signal-send
   ID   PROVIDER            MODULE                          FUNCTION NAME
  606       proc           vmlinux                     __send_signal signal-send

	Probe Description Attributes
		Identifier Names: Private
		Data Semantics:   Private
		Dependency Class: Unknown

	Argument Attributes
		Identifier Names: Evolving
		Data Semantics:   Evolving
		Dependency Class: ISA

	Argument Types
		args[0]: (unknown)
		args[1]: (unknown)
		args[2]: (unknown)
		args[3]: (unknown)
		args[4]: (unknown)
		args[5]: (unknown)
		args[6]: (unknown)
		args[7]: (unknown)
		args[8]: (unknown)
		args[9]: (unknown)
		args[10]: (unknown)
		args[11]: (unknown)
		args[12]: (unknown)
		args[13]: (unknown)
		args[14]: (unknown)
		args[15]: (unknown)
		args[16]: (unknown)
		args[17]: (unknown)
		args[18]: (unknown)
		args[19]: (unknown)
		args[20]: (unknown)
		args[21]: (unknown)
		args[22]: (unknown)
		args[23]: (unknown)
		args[24]: (unknown)
		args[25]: (unknown)
		args[26]: (unknown)
		args[27]: (unknown)
		args[28]: (unknown)
		args[29]: (unknown)
		args[30]: (unknown)
		args[31]: (unknown)

Here’s what it looks like on DelphixOS:

root@argos:~# dtrace -lv -n proc:::signal-send
   ID   PROVIDER            MODULE                          FUNCTION NAME
10764       proc           genunix                         sigtoproc signal-send

        Probe Description Attributes
                Identifier Names: Private
                Data Semantics:   Private
                Dependency Class: Unknown

        Argument Attributes
                Identifier Names: Evolving
                Data Semantics:   Evolving
                Dependency Class: ISA

        Argument Types
                args[0]: lwpsinfo_t *
                args[1]: psinfo_t *
                args[2]: int

Even without the type system being hooked up, you can definitely do some useful work with this beta. For example, I can use the proc provider to look at what commands are executing on my system:

[root@screven ~]# dtrace -n proc:::exec'{ trace(stringof(arg0)); }'
dtrace: description 'proc:::exec' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0    600            do_execve_common:exec   /usr/bin/staprun
  0    600            do_execve_common:exec   /usr/sbin/perf
  0    600            do_execve_common:exec   /bin/uname
  0    600            do_execve_common:exec   /usr/libexec/perf.2.6.39-101.0.1.el6uek.x86_64

On his blog, Wim Coekaerts showed some examples of use of the proc provider that included this common idiom:

proc:::create
{
        this->pid = *((int *)arg0 + 171);
        ...

It’s hard to know where that 171 constant came from or how a user would figure that out. I assume that this is because OEL doesn’t yet have proper types and it’s a hardcoded offset into some structure. Here’s what that would look like on completed DTrace implementations:

proc:::create
{
        this->pid = args[0]->pr_pid;
        ...

Progress

There’s a long way to go, but it looks like the folks at Oracle are making progress. It will be interesting to see the source code that goes along with this updated beta — as of this writing, the git repository has not been updated. Personally, I’m eager to see what user-land tracing looks like in the form of the pid provider and USDT. In the tradition of other ports such as Apple’s and FreeBSD’s, I’d invite the Oracle team to present their work at the upcoming DTrace conference, dtrace.conf.

After writing about Oracle’s port of DTrace to OEL, I wanted to take it for a spin. Following the directions that Wim Coekaerts spelled out, I installed and configured a VM to run OEL with Oracle’s nascent DTrace port. Setting up the system was relatively painless.

Here’s my first DTrace invocation on OEL:

[root@screven ~]# uname -a
Linux screven 2.6.32-201.0.4.el6uek.x86_64 #1 SMP Tue Oct 4 16:47:00 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@screven ~]# dtrace -n 'BEGIN{ trace("howdy from linux"); }'
dtrace: description 'BEGIN' matched 1 probe
CPU     ID                    FUNCTION:NAME
0      1                           :BEGIN   howdy from linux
^C

Then I wanted to see what was on the system:

[root@screven ~]# dtrace -l | wc -l
574
Are you kidding me? For comparison, my Mac has 154,918 probe available and our illumos-derived Delphix OS has 77,320 (Mac OS X has many probes pre-created for each process). It looks like this beta only has the syscall provider, but digging around I can see that Wim didn’t mention that the profile provider is also there:
[root@screven ~]# modprobe profile
[root@screven ~]# dtrace -l | wc -l
587
Sweet.
[root@screven ~]# dtrace -n profile:::profile-997
dtrace: failed to enable 'profile:::profile-997': Failed to enable probe
Not that sweet.
At least I can run my favorite DTrace script:
[root@screven ~]# dtrace -n syscall:::entry'{ @[execname] = count(); }'
dtrace: description 'syscall:::entry' matched 285 probes
^C
pickup                                                            9
abrtd                                                            11
qmgr                                                             17
rsyslogd                                                         25
rs:main Q:Reg                                                    35
master                                                           52
tty                                                              60
dircolors                                                        80
hostname                                                         92
tput                                                             92
id                                                              198
unix_chkpwd                                                     550
auditd                                                          599
dtrace                                                          760
bash                                                           1515
sshd                                                           8327
I wanted to trace activity when I connected to the system using ssh… but ssh logins fail with all probes enabled. To repeat: ssh logins fail with DTrace probes enabled. I’d try to debug it, but I’m too dejected.

Evaluation

While I’d like to give this obviously nascent port the benefit of the doubt, its current state is frankly embarrassing. It’s very clear now why Oracle wasn’t demonstrating this at OpenWorld last week: it doesn’t stand up to the mildest level of scrutiny. It’s fine that Oracle has embarked on a port of DTrace to the so-called unbreakable kernel, but this is months away from being usable. Announcing a product of this low quality and value calls into question Oracle’s credibility as a technology provider. Further, this was entirely avoidable; there were other DTrace ports to Linux that Oracle could have used as a starting point to produce something much closer to functional.

This is not DTrace

So, OEL users, know that this is not DTrace. This is no better than one of the DTrace knockoffs and in many ways much worse. What Oracle released is worse than worthless by violating perhaps the most fundamental tenet of DTrace: don’t damage the system. And, to the OEL folks, I’m sure you’ll get there, but how about you take down your beta until it’s ready? As it is, people might get the wrong impression about what DTrace is.

Yesterday (October 4, 2011) Oracle made the surprising announcement that they would be porting some key Solaris features, DTrace and Zones, to Oracle Enterprise Linux. As one of the original authors, the news about DTrace was particularly interesting to me, so I started digging.

I should note that this isn’t the first time I’ve written about DTrace for Linux. Back in 2005, I worked on Linux-branded Zones, Solaris containers that contained a Linux user environment. I wrote a coyly-titled blog post about examining Linux applications using DTrace. The subject was honest — we used precisely the same techniques to bring the benefits of DTrace to Linux applications — but the title wasn’t completely accurate. That wasn’t exactly “DTrace for Linux”, it was more precisely “The Linux user-land for Solaris where users can reap the benefits of DTrace”; I chose the snappier title.

I also wrote about DTrace knockoffs in 2007 to examine the Linux counter-effort. While the project is still in development, it hasn’t achieved the functionality or traction of DTrace. Suggesting that Linux was inferior brought out the usual NIH reactions which led me to write a subsequent blog post about a theoretical port of DTrace to Linux. While a year later Paul Fox started exactly such a port, my assumption at the time was that the primary copyright holder of DTrace wouldn’t be the one porting DTrace to Linux. Now that Oracle is claiming a port, the calculus may change a bit.

What is Oracle doing?

Even among Oracle employees, there’s uncertainty about what was announced. Ed Screven gave us just a couple of bullet points in his keynote; Sergio Leunissen, the product manager for OEL, didn’t have further details in his OpenWorld talk beyond it being a beta of limited functionality; and the entire Solaris team seemed completely taken by surprise.

What is in the port?

Leunissen stated that only the kernel components of DTrace are part of the port. It’s unclear whether that means just fbt or includes sdt and the related providers. It sounds certain, though, that it won’t pass the DTrace test suite which is the deciding criterion between a DTrace port and some sort of work in progress.

What is the license?

While I abhor GPL v. CDDL discussions, this is a pretty interesting case. According to the release manager for OEL, some small kernel components and header files will be dual-licensed while the bulk of DTrace — the kernel modules, libraries, and commands — will use the CDDL as they had under (the now defunct) OpenSolaris (and to the consernation of Linux die-hards I’m sure). Oracle already faces an interesting conundum with their CDDL-licensed files: they can’t take the fixes that others have made to, for example, ZFS without needing to release their own fixes. The DTrace port to Linux is interesting in that Oracle apparently thinks that the CDDL license will make DTrace too toxic for other Linux vendors to touch.

Conclusion

Regardless of how Oracle brings DTrace to Linux, it will be good for DTrace and good for its users — and perhaps best of all for the author of the DTrace book. I’m cautiously optimistic about what this means for the DTrace development community if Oracle does, in fact, release DTrace under the CDDL. While this won’t mean much for the broader Linux community, we in the illumos community will happily accept anything of value Oracle adds. The Solaris lover in me was worried when it appeared that OEL was raiding the Solaris pantry, but if this is Oracle’s model for porting, then I — and the entire illumos community I’m sure — hope that more and more of Solaris is open sourced under the aegis of OEL differentiation.

10/10/2011 follow-up post, Oracle’s port: this is not DTrace.

For a short while, I ran the flash memory strategy at Sun and then Oracle, so I still keep my ear to the ground regarding flash news. That news is often frustratingly light — journalists in the space who are fully capable of providing analysis end up brushing the surface. With a tip of the hat to the FJM crew, here’s my commentary on a recent article.
NetApp has Hybrid Aggregate drives coming, with data moved automatically in real time between flash located next to the spinning disks. The company now says that this is a better technology than PCIe flash approaches.
Sounds interesting. NetApp had previously stacked its chips on a PCIe approach for flash called the performance acceleration module (PAM); I read about it in the same publication. This apparent change of strategy is significant, and I wish that the article would have explored the issue, but it was never mentioned.
NetApp, presenting at an Analyst Day event in New York on 30 June, said that having networked storage move as it were into the host server environment was disadvantageous. This was according to Stifel Nicolaus analyst Aaron Rakers.
1. So is this a quote from NetApp or a quote from an analyst or a quote from NetApp quoting an analyst? I’m confused.
2. This is a dense and interesting statement so allow me to unpack it. Moving storage to the host server is code for Fusion-io. These guys make a flash-laden PCIe card that you put in your compute node for super-fast local data access, and they connect a bunch of them together with an IB backplane to share the contents of different cards between hosts. They recently went public, and customers love the performance they offer over traditional SANs. I assume the term “disadvantageous” was left intentionally vague as those being disadvantaged may be NTAP shareholders rather than customers implementing such a solution.
Manish Goel, NetApp’s product ops EVP, said SSDs used as hard disk drive replacements were not as interesting as using flash at the disk layer in a Hybrid Aggregate drive approach – and this was coming.
An Aggregate is the term NetApp uses for a collection of drives. A Hybrid Aggregate — presumably — is some new thing that mixes HDDs and SSDs. Maybe it’s like Sun’s hybrid storage pool. I would have liked to see Manish Goel’s statement vetted or explained, but that’s all we get.
Flash Cache in the controller is a straightforward array read I/O accelerator. PCIe flash in host servers is a complementary technology but will not decentralise the storage market and move networked storage back into the host servers.
Is this still the NetApp announcement or is this back to the journalism? It’s a new paragraph so I guess it’s the latter. Fusion-io will be happy to learn that it only took a couple of lines to be upgraded from “disadvantageous” to “complementary”. And you may be interested to know why NetApp says that host-based flash is complementary. There’s a vendor out there working with NetApp on a host-based flash PCIe card that NetApp will treat as part of its caching tier, pushing data to the card for fast access by the host. I’d need to dig up my notes from the many vendor roadmaps I saw to recall who is building this, but in the context of a public blog post it’s probably better that I don’t.
NetApp has a patent in this Hybrid Aggregate disk drive area called “Mechanisms for moving data in a Hybrid Aggregate”.
I won’t bore you by reposting the except from the patent, but the broad language of the patent does recall to mind the many recent invalidated NetApp patents…
Surely this is what we all understand as auto-placement of data in a virtual storage pool comprising SSD and fast disk tiers, such as Compellent’s block-level Data Progression? Not so, according to a person close to the situation: “It’s much more automatic, real-time and granular. Compellent needs policies and is not real-time. [NetApp] will be automatic and always move data real-time, rather than retroactively.”
What could have followed this — but didn’t — was a response from a representative from Compellent or someone familiar with their technology. Compellent, EMC, Oracle, and others all have strategies that involve mixing flash memory with conventional hard drives. It’s the rare article that discusses those types of connections. Oracle’s ZFSproducts uses flash as a caching tier, automatically populating it with useful data. Compellent has a clever technique of moving data between storage tiers seamlessly — and customers seem to love it. EMC just hucks a bunch of SSDs into an array — and customers seem to grin and bear it. NetApp’s approach? It’s hard to decipher what it would mean to “move data in real-time, rather than retroactively.” Does that mean that data is moved when it’s written and then never moved again? That doesn’t sound better. My guess is that NetApp’s approach is very much like Compellent’s — something they should be touting rather than parrying. And I’d love to read that article.

In 2005, Sun released the source code to Solaris,  described then as the company’s crown jewel. Why do this? The simplest answer is that Solaris had been losing ground to an open source competitor in Linux. Losing ground was a symptom of  economics. Students who had once been raised on Solaris were being inculcated with Linux knowlege. The combination of Linux and x86 were good enough and significantly cheaper; new companies for whom the default had once been Sun/Solaris/SPARC were instead building on x86/Linux. OpenSolaris along with x86 support were specifically intended to address this trend. Indeed, the codename for OpenSolaris was “tonic” — the tonic for Solaris’ problems.

To that end, OpenSolaris was on reasonably stable footing: open source had become expected for an operating system,  source code availability was a benefit to traditional enterprise users (especially with the advent of DTrace), and the community would attract new users. But then Solaris lost the plot. Users chose Solaris because it is a — or perhaps the — enterprise operating system. OpenSolaris was intended to broaden the appeal, but that notion was taken to such extremes as to lose sight of the traditional customers of Solaris, and, indeed, the focus that makes Solaris both unique and great.

OpenSolaris  June 14, 2005 – August 13, 2010

We launched Solaris 10 in 2004 with an impressive list of features — ZFS, DTrace, Zones, SMF, FMA, Fire Engine — all highly relevant for enterprise users. You can find a company that has bet its business on the success of each of those features. In the wake of OpenSolaris, the decision was made (and here I can no longer use the active voice because by then I had left to start Fishworks elsewhere at Sun) to have an explicit focus on building an operating system for developers — which is to say, for their laptops. This was an error, but a predictable one. Once Solaris was free to download and use, revenue recognition for the Solaris organization which has always been difficult to measure became even more indirect. The metrics were changed: the targets for management bonuses became not revenue, or enterprise users, but downloads. Directly or indirectly much of the focus for the Solaris organization shifted to address that straightforward goal. The mistake was that OpenSolaris didn’t need to find users, they found Solaris. In trying to build a community, the new direction for OpenSolaris weakened the very principles upon which a thriving community would have been based.

The very name “OpenSolaris” got confused, diluted, and poluted. OpenSolaris was a source repository, a community, and a distro (although purists still insist that Indiana is the appropriate name for that part) intended to “close the familiarity gap” with Linux. Moreover, new projects that shifted efforts away from enterprise uses (read: paying customers) to focus on the laptop also rallied under the banner of “OpenSolaris”. In a way Oracle’s acquisition of Sun saved Solaris from itself; the marching orders became much clearer: address enterprise users, ship Solaris 11 (something that should not have taken 6 years). As for OpenSolaris, that decision too was likely simple for Oracle, never an overt fan of open source. Had “OpenSolaris” simply meant a code base and user community, I think there’s a good chance it would have been allowed to live. Burdened, however, with the baggage of the Indiana distro and sundry projects incomprehensible to Oracle management, OpenSolaris was in a politically untenable position. Mike‘s “Friday the 13th memo” merely made it official — Solaris was to be closed source once more.

Sun’s efforts with OpenSolaris  were, at best, a mixed success. Quietly, however, an ecosystem of companies grew out of the technologies in OpenSolaris. Notably Joyent uses Zones and DTrace as significant differentiators; Nexenta builds very heavily on ZFS; as I’ve mentioned, Delphix, my new employer, builds on OpenSolaris as well. There are many more that I know about, and still more that I don’t. These companies chose OpenSolaris so they could use the innovative technologies that simply aren’t available anywhere else. And they did so in spite of a common trend towards Linux with its familiarity, and broad compatibility — the innovation in Solaris was more valuable and, in some cases, enabling for the company’s business.

illumos  August 3, 2010 –

The danger for those companies has long been that Oracle would pull the rug out from under them; only the foolish had no contingency plan. The options were to give up on Solaris or maintain a fork. Happily illumos has stepped in to offer a third path. Garrett D’Amore and Nexenta graciously started the illumos project to carry the OpenSolaris torch. It is an ostensible fork of OpenSolaris (can you fork a dead project?), but more importantly a mechanism by which companies building on those component technologies can pool their resources, amortize their costs, and build a community by and for the downstream users who are investing in those same technologies. Rather than being operated by a single corporate interest, its steward will be a 501(c)(3) non-profit in the model of the Mozilla Foundation.

I was pleased to announce at tonight’s SVOSUG meeting that I’ll be joining the illumos developer council, I was delighted to accept when Garrett offered me the position. My bias for illumos is that the main repository will focus on reliability, performance, and compatibility while taking a conservative approach to new features and functionality. As much as possible, I’d like the downstream users — the distributions, appliances, and platforms — to make the decisions appropriate to their uses and only adopt large-scale changes into the trunk when there’s broad consensus among them. The goal must be to build a project that is readily useful to everyone and to allow our collective efforts to be shared as easily as possible.

What’s the future of Solaris? For many it will be Solaris 11 in late 2011. But for others, it will be illumos either as the firmware for an appliance (not unlike what we built at Fishworks in the 7000 series), the platform for your web applications, or as a general purpose operating system. The innovation in Solaris has always flowed from the creative individuals working on the project. Keep your eyes on illumos; Oracle ending OpenSolaris is no surprise, but in doing so they have broken their own monopoly on Solaris and Solaris talent.

I joined the Solaris Kernel Group in 2001 at what turned out to be a remarkable place and time for the industry. More by luck and intuition than by premonition, I found myself surrounded by superlative engineers working on revolutionary technologies that were the products of their own experience and imagination rather than managerial fiat. I feel very lucky to have worked with Bryan and Mike on DTrace; it was amazing that just down the hall our colleagues reinvented the operating system with Zones, ZFS, FMA, SMF and other innovations.

With Solaris 10 behind us, lauded by customers and pundits, I was looking for that next remarkable place and time, and found it with Fishworks. The core dozen or so are some of the finest engineers I could hope to work with, but there were so many who contributed to the success of the 7000 series. From the executives who enabled our fledgling skunkworks in its nascent days, to our Solaris colleagues building fundamental technologies like ZFS, COMSTAR, SMF, networking, I/O, and IPS, and the OpenStorage team who toiled to bring a product to market, educating us without deflating our optimism in the process.

I would not trade the last 9 years for anything. There are many engineers who never experience a single such confluence of talent, organizational will, and success; I’m grateful to my colleagues and to Sun for those two opportunities. Now I’m off to look for my next remarkable place and time beyond the walls of Oracle. My last day will be August 20th, 2010.

Thank you to the many readers of this blog. After six years and 130 posts I’d never think of giving it up. You’ll be able to find my new blog at dtrace.org/blogs/ahl (comments to this post are open there); I can’t wait to begin chronicling my next endeavors. You can reach me by email here: my initials at alumni dot brown dot edu. I look forward to your continued to comments and emails. Thanks again!

Recent Posts

April 17, 2024
January 13, 2024
December 29, 2023
February 12, 2017
December 18, 2016

Archives

Archives