Adam Leventhal's blog

Tag: illumos

I’ve been watching ZFS from moments after its inception at the hands of Matt Ahrens and Jeff Bonwick, so I’m excited to see it enter its newest phase of development in OpenZFS. While ZFS has long been regarded as the hottest filesystem on 128 bits, and has shipped in many different products, what’s been most impressive to me about ZFS development has been the constant iteration and reinvention.

Before shipping in Solaris 10 update 2, major components of ZFS had already advanced to “2.0” and “3.0”. I’ve been involved with several ZFS-related products: Solaris 10, the ZFS Storage Appliance (nee Sun Storage 7000), and the Delphix Engine. Each new product and each new use has stressed ZFS in new ways, but also brought renewed focus to development. I’ve come to realize that ZFS will never be completed. I thought I’d use this post to cover the many ways that ZFS had failed in the products I’ve worked on over the years — and it has failed spectacularly at time — but this distracted from the most important aspect of ZFS. For each new failure in each new product with each new use and each new workload ZFS has adapted and improved.

OpenZFS doesn’t need a caretaker community for a finished project; if that were the case, porting OpenZFS to Linux, FreeBSD, and Mac OS X would have been the end. Instead, it was the beginning. The need for the OpenZFS community grew out of the porting efforts who wanted the world’s most advanced filesystem on their platforms and in their products. I wouldn’t trust my customers’ data to a filesystem that hadn’t been through those trials and triumphs over more than a decade. I can’t wait to see the next phase of evolution that OpenZFS brings.

 

If you’re at LinuxCon today, stop by the talk by Matt Ahrens and Brian Behlendor for more on OpenZFS; follow @OpenZFS for all OpenZFS news.

Back in October I was pleased to attend — and my employer, Delphix, was pleased to sponsor — illumos day and ZFS day, run concurrently with Oracle Open World. Inspired by the success of dtrace.conf(12) in the Spring, the goal was to assemble developers, practitioners, and users of ZFS and illumos-derived distributions to educate, share information, and discuss the future.

illumos day

The week started with the developer-centric illumos day. While illumos picked up the torch when Oracle re-closed OpenSolaris, each project began with a very different focus. Sun and the OpenSolaris community obsessed with inclusion, and developer adoption — often counterproductively. The illumos community is led by those building products based on the unique technologies in illumos — DTrace, ZFS, Zones, COMSTAR, etc. While all are welcome, it’s those who contribute the most whose voices are most relevant.

I was asked to give a talk about technologies unique to illumos that are unlikely to appear in Oracle Solaris. It was only when I started to prepare the talk that the difference in focuses of illumos and Oracle Solaris fell into sharp focus. In the illumos community, we’ve advanced technologies such as ZFS in ways that would benefit Oracle Solaris greatly, but Oracle has made it clear that open source is anathema for its version of Solaris. For example, at Delphix we’ve recently been fixing bugs, asking ourselves, “how has Oracle never seen this?”.

Yet the differences between illumos and Oracle Solaris are far deeper. In illumos we’re building products that rely on innovation and differentiation in the operating system, and it’s those higher-level products that our various customers use. At Oracle, the priorities are more traditional: support for proprietary SPARC platforms, packaging and updating for administrators, and ease-of-use. In my talk, rather than focusing on the sundry contributions to illumos, I picked a few of my favorites. The slides are more or less incomprehensible on their own; many thanks to Deirdre Straughan for posting the video (and for putting together the event!) — check out 40:30 for a photo of Jean-Luc Picard attending the DTrace talk at OOW.

[youtube_sc url=”https://www.youtube.com/watch?v=7YN6_eRIWWc&t=0m19s”]

ZFS day

While illumos day was for developers, ZFS day was for users of ZFS to learn from each others’ experiences, and hear from ZFS developers. I had the ignominous task of presenting an update on the Hybrid Storage Pool (HSP). We developed the HSP at Fishworks as the first enterprise storage system to add flash memory into the storage hierarchy to accelerate reads and writes. Since then, economics and physics have thrown up some obstacles: DRAM has gotten cheaper, and flash memory is getting harder and harder to turn into a viable enterprise solution. In addition, the L2ARC that adds flash as a ZFS read cache, has languished; it has serious problems that no one has been motivated or proficient enough to address.

I’ll warn you that after the explanation of the HSP, it’s mostly doom and gloom (also I was sick as a dog when I prepared and gave the talk), but check out the slides and video for more on the promise and shortcomings of the HSP.

[youtube_sc url=”http://www.youtube.com/watch?v=P77HEEgdnqE&feature=youtu.be”]

Community

For both illumos day and ZFS day, it was a mostly full house. Reuniting with the folks I already knew was fun as always, but even better was to meet so many who I had no idea were building on illumos or using ZFS. The events highlighted that we need to facilitate more collaboration — especially around ZFS — between the illumos distros, FreeBSD, and Linux — hell, even Oracle Solaris would be welcome.

At the illumos hackathon last week, Robert Mustacchi and I prototyped better support for manipulating user-land structures. As anyone who’s used it knows, DTrace is currently very kernel-centric — this both reflects the reality of how operating systems and DTrace are constructed, and the origins of DTrace itself in the Solaris Kernel Group. Discussions at dtrace.conf(12) this spring prompted me to chart a path to better user-land support. This prototype of copyin-automagic was a first step.

What we implemented was a new ‘user’ keyword to denote that a type is a user-land structure. For example, let’s say we had the address of a 4-byte integer; today we’d access its value using copyin():

this->i = *(int *)copyin(this->addr, sizeof (int));

With our prototype, this gets simpler and more intuitive:

this->i = *(user int *)addr;

The impact is even more apparent when it comes to pointer chasing through structures. Today if we need to get to the third element of a linked list, the D code would look like this:

this->p = (node_t *)copyin(this->addr, sizeof (node_t));
this->p = (node_t *)copyin((uintptr_t)this->p->next, sizeof (node_t));
this->p = (node_t *)copyin((uintptr_t)this->p->next, sizeof (node_t));
trace(this->p->value);

Again, it’s much simpler with the user keyword:

this->p = (user node_t *)this->addr;
trace(this->p->next->next->value);

D programs are compiled into a series of instructions — DIF — that are executed by the code DTrace framework when probes fire. We use the new keyword to generate instructions that load from the address space of the currently executing process rather than that of the kernel.

Adding a new keyword feels a little clunky, and I’m not sure if it’s the right way forward, but it does demonstrate a simpler model for accessing user-land structures, and was a critical first step. We already have three main sources of user-land values; the next steps are to make use of those.

Types for system calls

Arguments to system calls (mostly) have well-known types. Indeed those types are encoded in truss in excruciating and exotic detail. We should educate DTrace about those types. What I’d propose is that we create a single repository of all system call metadata. This could be, for example, and XML file that listed every system call, its name, code, subcode, types, etc. Of course we could use that as the source of type information for the syscall provider, but we could also use that to generate everything from the decoding tables in truss to the libc and kernel stubs for system calls.

As an aside, there are a couple of system calls whose parameter types — ioctl(2) is the obvious example. It would be interesting to assess the utility of an ioctl provider whose probes would be the various codes that are passed as the second parameter. Truss already has this information; why not DTrace?

Types for the pid provider

Another obvious source of type information is the process being traced. When a user specifies the -p or -c option to dtrace(1M) we’re examining a particular process, and that process can have embedded type information. We could use those types and implicitly identify them as belonging to user-space rather than the kernel. Pid provider probes correspond to the entry and return from user-land functions; we could identify the appropriate types for those parameters. We could simplify it further by doing all type handling in libdtrace (in user-land) rather than pushing the types into the kernel.

Types for USDT

User-land statically defined tracing — tracepoints explicitly inserted into code — can already have types associated with them. A first step would be to implicitly identify those types as belonging to user-land. I believe that this could even be done without adversely affecting existing scripts.

Thorny issues

While there are some clear paths forward, there are some tricky issues that remain. In particular that processes can have different data models — 32-bit v. 64-bit — presents a real challenge. Both the width of a load and offsets into structures change depending on the process that’s running. There might be some shortcuts for system calls, and we might be able to constrain the problem for the pid provider by requiring -p or -c, or we might have to compile our D program twice and then choose which version to run based on the data model of the process. In the spirit of the hackathon, Robert and I punted for our ‘user’ keyword prototype, but these problems need to be well understood and sufficiently solved.

Next steps

I’ll be working on some of these problems on the back burner; I’m especially interested in the Grand Unified Syscall Project — an idea I’ve been touting for more years than I care to relate — to bring types to the syscall provider. If you have ideas for user-land tracing with DTrace, or want to work on anything I’ve mention, leave a comment or drop me a note.

Exactly 10 years ago today, Jeff Bonwick and Matt Ahrens got their first ZFS prototype working in user-land. Jeff had scrapped his previous attempt at reinventing filesystems, working through the established filesystem management and engineering channels at Sun, and this time started with a clean sheet of paper. Matt had joined Sun that June shortly after graduating from Brown University. Both prodigious coders, the duo, in remarkably short order, showed us a glint of what ZFS would be. A year later, the master and apprentice had ZFS working in the kernel, moving data from end to end. Three years after that, standing in front of a team of a dozen engineers, Matt typed ‘putback’ to integrate ZFS into Solaris. The distance ZFS has traveled these past 10 years has been monumental, and ZFS has indelibly impacted the industry. ZFS is one of the load-bearing pillars here at Delphix; without it, our task would have been too ambitious to even begin. Congratulations to our own Matt Ahrens on this milestone, as well as to Jeff, and everyone else who has contributed to ZFS over the last 10 years including the growing community building new products around ZFS and illumos.

Update: Check out Matt’s blog post on the subject.

Yesterday (October 4, 2011) Oracle made the surprising announcement that they would be porting some key Solaris features, DTrace and Zones, to Oracle Enterprise Linux. As one of the original authors, the news about DTrace was particularly interesting to me, so I started digging.

I should note that this isn’t the first time I’ve written about DTrace for Linux. Back in 2005, I worked on Linux-branded Zones, Solaris containers that contained a Linux user environment. I wrote a coyly-titled blog post about examining Linux applications using DTrace. The subject was honest — we used precisely the same techniques to bring the benefits of DTrace to Linux applications — but the title wasn’t completely accurate. That wasn’t exactly “DTrace for Linux”, it was more precisely “The Linux user-land for Solaris where users can reap the benefits of DTrace”; I chose the snappier title.

I also wrote about DTrace knockoffs in 2007 to examine the Linux counter-effort. While the project is still in development, it hasn’t achieved the functionality or traction of DTrace. Suggesting that Linux was inferior brought out the usual NIH reactions which led me to write a subsequent blog post about a theoretical port of DTrace to Linux. While a year later Paul Fox started exactly such a port, my assumption at the time was that the primary copyright holder of DTrace wouldn’t be the one porting DTrace to Linux. Now that Oracle is claiming a port, the calculus may change a bit.

What is Oracle doing?

Even among Oracle employees, there’s uncertainty about what was announced. Ed Screven gave us just a couple of bullet points in his keynote; Sergio Leunissen, the product manager for OEL, didn’t have further details in his OpenWorld talk beyond it being a beta of limited functionality; and the entire Solaris team seemed completely taken by surprise.

What is in the port?

Leunissen stated that only the kernel components of DTrace are part of the port. It’s unclear whether that means just fbt or includes sdt and the related providers. It sounds certain, though, that it won’t pass the DTrace test suite which is the deciding criterion between a DTrace port and some sort of work in progress.

What is the license?

While I abhor GPL v. CDDL discussions, this is a pretty interesting case. According to the release manager for OEL, some small kernel components and header files will be dual-licensed while the bulk of DTrace — the kernel modules, libraries, and commands — will use the CDDL as they had under (the now defunct) OpenSolaris (and to the consernation of Linux die-hards I’m sure). Oracle already faces an interesting conundum with their CDDL-licensed files: they can’t take the fixes that others have made to, for example, ZFS without needing to release their own fixes. The DTrace port to Linux is interesting in that Oracle apparently thinks that the CDDL license will make DTrace too toxic for other Linux vendors to touch.

Conclusion

Regardless of how Oracle brings DTrace to Linux, it will be good for DTrace and good for its users — and perhaps best of all for the author of the DTrace book. I’m cautiously optimistic about what this means for the DTrace development community if Oracle does, in fact, release DTrace under the CDDL. While this won’t mean much for the broader Linux community, we in the illumos community will happily accept anything of value Oracle adds. The Solaris lover in me was worried when it appeared that OEL was raiding the Solaris pantry, but if this is Oracle’s model for porting, then I — and the entire illumos community I’m sure — hope that more and more of Solaris is open sourced under the aegis of OEL differentiation.

10/10/2011 follow-up post, Oracle’s port: this is not DTrace.

In 2005, Sun released the source code to Solaris,  described then as the company’s crown jewel. Why do this? The simplest answer is that Solaris had been losing ground to an open source competitor in Linux. Losing ground was a symptom of  economics. Students who had once been raised on Solaris were being inculcated with Linux knowlege. The combination of Linux and x86 were good enough and significantly cheaper; new companies for whom the default had once been Sun/Solaris/SPARC were instead building on x86/Linux. OpenSolaris along with x86 support were specifically intended to address this trend. Indeed, the codename for OpenSolaris was “tonic” — the tonic for Solaris’ problems.

To that end, OpenSolaris was on reasonably stable footing: open source had become expected for an operating system,  source code availability was a benefit to traditional enterprise users (especially with the advent of DTrace), and the community would attract new users. But then Solaris lost the plot. Users chose Solaris because it is a — or perhaps the — enterprise operating system. OpenSolaris was intended to broaden the appeal, but that notion was taken to such extremes as to lose sight of the traditional customers of Solaris, and, indeed, the focus that makes Solaris both unique and great.

OpenSolaris  June 14, 2005 – August 13, 2010

We launched Solaris 10 in 2004 with an impressive list of features — ZFS, DTrace, Zones, SMF, FMA, Fire Engine — all highly relevant for enterprise users. You can find a company that has bet its business on the success of each of those features. In the wake of OpenSolaris, the decision was made (and here I can no longer use the active voice because by then I had left to start Fishworks elsewhere at Sun) to have an explicit focus on building an operating system for developers — which is to say, for their laptops. This was an error, but a predictable one. Once Solaris was free to download and use, revenue recognition for the Solaris organization which has always been difficult to measure became even more indirect. The metrics were changed: the targets for management bonuses became not revenue, or enterprise users, but downloads. Directly or indirectly much of the focus for the Solaris organization shifted to address that straightforward goal. The mistake was that OpenSolaris didn’t need to find users, they found Solaris. In trying to build a community, the new direction for OpenSolaris weakened the very principles upon which a thriving community would have been based.

The very name “OpenSolaris” got confused, diluted, and poluted. OpenSolaris was a source repository, a community, and a distro (although purists still insist that Indiana is the appropriate name for that part) intended to “close the familiarity gap” with Linux. Moreover, new projects that shifted efforts away from enterprise uses (read: paying customers) to focus on the laptop also rallied under the banner of “OpenSolaris”. In a way Oracle’s acquisition of Sun saved Solaris from itself; the marching orders became much clearer: address enterprise users, ship Solaris 11 (something that should not have taken 6 years). As for OpenSolaris, that decision too was likely simple for Oracle, never an overt fan of open source. Had “OpenSolaris” simply meant a code base and user community, I think there’s a good chance it would have been allowed to live. Burdened, however, with the baggage of the Indiana distro and sundry projects incomprehensible to Oracle management, OpenSolaris was in a politically untenable position. Mike‘s “Friday the 13th memo” merely made it official — Solaris was to be closed source once more.

Sun’s efforts with OpenSolaris  were, at best, a mixed success. Quietly, however, an ecosystem of companies grew out of the technologies in OpenSolaris. Notably Joyent uses Zones and DTrace as significant differentiators; Nexenta builds very heavily on ZFS; as I’ve mentioned, Delphix, my new employer, builds on OpenSolaris as well. There are many more that I know about, and still more that I don’t. These companies chose OpenSolaris so they could use the innovative technologies that simply aren’t available anywhere else. And they did so in spite of a common trend towards Linux with its familiarity, and broad compatibility — the innovation in Solaris was more valuable and, in some cases, enabling for the company’s business.

illumos  August 3, 2010 –

The danger for those companies has long been that Oracle would pull the rug out from under them; only the foolish had no contingency plan. The options were to give up on Solaris or maintain a fork. Happily illumos has stepped in to offer a third path. Garrett D’Amore and Nexenta graciously started the illumos project to carry the OpenSolaris torch. It is an ostensible fork of OpenSolaris (can you fork a dead project?), but more importantly a mechanism by which companies building on those component technologies can pool their resources, amortize their costs, and build a community by and for the downstream users who are investing in those same technologies. Rather than being operated by a single corporate interest, its steward will be a 501(c)(3) non-profit in the model of the Mozilla Foundation.

I was pleased to announce at tonight’s SVOSUG meeting that I’ll be joining the illumos developer council, I was delighted to accept when Garrett offered me the position. My bias for illumos is that the main repository will focus on reliability, performance, and compatibility while taking a conservative approach to new features and functionality. As much as possible, I’d like the downstream users — the distributions, appliances, and platforms — to make the decisions appropriate to their uses and only adopt large-scale changes into the trunk when there’s broad consensus among them. The goal must be to build a project that is readily useful to everyone and to allow our collective efforts to be shared as easily as possible.

What’s the future of Solaris? For many it will be Solaris 11 in late 2011. But for others, it will be illumos either as the firmware for an appliance (not unlike what we built at Fishworks in the 7000 series), the platform for your web applications, or as a general purpose operating system. The innovation in Solaris has always flowed from the creative individuals working on the project. Keep your eyes on illumos; Oracle ending OpenSolaris is no surprise, but in doing so they have broken their own monopoly on Solaris and Solaris talent.

Recent Posts

February 12, 2017
December 18, 2016
August 9, 2016
August 2, 2016

Archives

Archives