Adam Leventhal's blog

Category: illumos

A couple of weeks ago, Joyent hosted A Midsummer Night’s Systems meetup, a fun event with talks ranging from Node.js fatwas to big data for Mario Kart 64. My colleague Jeremy Jones had recently done some amazing work, perfect for the meetup, but with his first child less than a day old, Jeremy allowed me to present in his stead. In this short video (16 minutes) I talk about Jeremy’s investigation of a nasty bug that necessitated the creation of two awesome post-mortem tools. The first is what I call jdump, a Volatility plugin that takes a VMware snapshot and produces an illumos kernel crash dump. The second is ::gcore, an mdb command that can extract a fully functioning core file from a kernel crash dump. Together, they let us at Delphix scoop up all the state we’d need for a diagnosis with minimal interruption even when there’s no hard failure. Jeremy’s tools are close to magic, and without them the problem was close to undebuggable.

[youtube_sc url=”OLFqOBxUhfM”]

Thanks, Jeremy for letting me present on your great work. And thanks to Deirdre Straughan and Joyent for the great event!

Back in October I was pleased to attend — and my employer, Delphix, was pleased to sponsor — illumos day and ZFS day, run concurrently with Oracle Open World. Inspired by the success of dtrace.conf(12) in the Spring, the goal was to assemble developers, practitioners, and users of ZFS and illumos-derived distributions to educate, share information, and discuss the future.

illumos day

The week started with the developer-centric illumos day. While illumos picked up the torch when Oracle re-closed OpenSolaris, each project began with a very different focus. Sun and the OpenSolaris community obsessed with inclusion, and developer adoption — often counterproductively. The illumos community is led by those building products based on the unique technologies in illumos — DTrace, ZFS, Zones, COMSTAR, etc. While all are welcome, it’s those who contribute the most whose voices are most relevant.

I was asked to give a talk about technologies unique to illumos that are unlikely to appear in Oracle Solaris. It was only when I started to prepare the talk that the difference in focuses of illumos and Oracle Solaris fell into sharp focus. In the illumos community, we’ve advanced technologies such as ZFS in ways that would benefit Oracle Solaris greatly, but Oracle has made it clear that open source is anathema for its version of Solaris. For example, at Delphix we’ve recently been fixing bugs, asking ourselves, “how has Oracle never seen this?”.

Yet the differences between illumos and Oracle Solaris are far deeper. In illumos we’re building products that rely on innovation and differentiation in the operating system, and it’s those higher-level products that our various customers use. At Oracle, the priorities are more traditional: support for proprietary SPARC platforms, packaging and updating for administrators, and ease-of-use. In my talk, rather than focusing on the sundry contributions to illumos, I picked a few of my favorites. The slides are more or less incomprehensible on their own; many thanks to Deirdre Straughan for posting the video (and for putting together the event!) — check out 40:30 for a photo of Jean-Luc Picard attending the DTrace talk at OOW.

[youtube_sc url=”https://www.youtube.com/watch?v=7YN6_eRIWWc&t=0m19s”]

ZFS day

While illumos day was for developers, ZFS day was for users of ZFS to learn from each others’ experiences, and hear from ZFS developers. I had the ignominous task of presenting an update on the Hybrid Storage Pool (HSP). We developed the HSP at Fishworks as the first enterprise storage system to add flash memory into the storage hierarchy to accelerate reads and writes. Since then, economics and physics have thrown up some obstacles: DRAM has gotten cheaper, and flash memory is getting harder and harder to turn into a viable enterprise solution. In addition, the L2ARC that adds flash as a ZFS read cache, has languished; it has serious problems that no one has been motivated or proficient enough to address.

I’ll warn you that after the explanation of the HSP, it’s mostly doom and gloom (also I was sick as a dog when I prepared and gave the talk), but check out the slides and video for more on the promise and shortcomings of the HSP.

[youtube_sc url=”http://www.youtube.com/watch?v=P77HEEgdnqE&feature=youtu.be”]

Community

For both illumos day and ZFS day, it was a mostly full house. Reuniting with the folks I already knew was fun as always, but even better was to meet so many who I had no idea were building on illumos or using ZFS. The events highlighted that we need to facilitate more collaboration — especially around ZFS — between the illumos distros, FreeBSD, and Linux — hell, even Oracle Solaris would be welcome.

ZFS recently celebrated its informal 10th anniversary; to mark the occasion, Delphix hosted a ZFS-themed meetup for the illumos community (sponsored generously by Joyent). Many thanks to Deirdre Straughan, the new illumos community manager, for helping to organize and for filming the event. Three of my colleagues at Delphix presented work they’ve been doing in the ZFS ecosystem.

Matt Ahrens, who (with Jeff Bonwick) invented ZFS back in 2001, started the program with a discussion of a new stable interface for ZFS. Initially libzfs had been designed as a set of helper functions in support of the zfs(1M) and zpool(1M) commands; since then, it has outgrown those humble ambitions and a new, simple, stable interface is needed for programmatic consumers of ZFS. In Matt’s talk and blog post, he lays out a series of guiding principles for the new libzfs_core library; he’s already started to implement these ideas for new ZFS features in development at Delphix.

John Kennedy has been working on a relatively neglected part of illumos: automated testing. At the meetup John spoke about the work he’s been doing to revitalize the ZFS test suite, and to build a unit testing framework for illumos at large. I found the questions and enthusiasm from the people in the room particularly encouraging — everyone knows that we need to be doing more testing, but until John stepped up, no one was leading the charge. The ZFS test suite is available on github. Take a look at John’s blog post to see how to execute the ZFS test suite, and how you can contribute to illumos by helping him diagnose and fix the 60+ outstanding failures.

Chris Siden has been at Delphix just since he graduated from Brown University this past spring, but he’s already made a tremendous impact on ZFS. Chris presented both the work he’s done to finish the work started by Basil Crow (also of Brown, and soon full-time at Delphix) on ZFS feature flags (originally presented to the ZFS community by Matt back in May). Previously, ZFS features followed a single, linear versioning; with Chris and Basil’s work it’s not a land-grab for the next version, rather each feature can be enabled discretely. Chris also implemented the world’s first flagged ZFS feature, Async Destroy (also known to ZFS feature flags as com.delphix:async_destroy) which allows datasets to be destroyed asynchronously in the background — a huge boon when destroying gigantic ZFS datasets. Chris also presented some work he’s been doing on backward compatibility testing for ZFS; check out his blog post on both subjects.

The illumos meetup was a great success. Thank you to everyone who attended in person or on the web. To get involved with the ZFS community, join the illumos ZFS mailing list, and for information on the next illumos meetup, join the group.

On Monday, the Delphix systems crew is rolling down the 101 to the illumos hackathon in San Jose. Anyone who’s working on illumos, developing illumos-derived technologies like ZFS or DTrace, or who wants to cut some OS code, should drop by. Here’s the sign up.

What’s a hackathon? Not exactly sure, but we’re hoping to cut a bunch of code, and hopefully build some neat stuff in a short period of time. The basic plan is that we’ll meet at the Wyndam at 10am — if you can’t track us down, message me on twitter. Bring your ideas for cool, fun projects, that a few people might be able to bang out in a marathon day. At Delphix, we’ve been thinking about time-correlated DTrace, libzfs v2, tab-completion for mdb, some build system improvements, and something goofy we call tarfs. We’ll throw the ideas at the whiteboard, break into project teams, and hack them up until we have something to demo (and Joyent has offered up build servers if you need one).

For now, think about what you’d want to build (think fun but small — we only have the day), pull down the illumos source, and hoard your favorite source of caffeine. Got a good idea? Post it in the comments. And I hope to see you Monday.

In 2005, Sun released the source code to Solaris,  described then as the company’s crown jewel. Why do this? The simplest answer is that Solaris had been losing ground to an open source competitor in Linux. Losing ground was a symptom of  economics. Students who had once been raised on Solaris were being inculcated with Linux knowlege. The combination of Linux and x86 were good enough and significantly cheaper; new companies for whom the default had once been Sun/Solaris/SPARC were instead building on x86/Linux. OpenSolaris along with x86 support were specifically intended to address this trend. Indeed, the codename for OpenSolaris was “tonic” — the tonic for Solaris’ problems.

To that end, OpenSolaris was on reasonably stable footing: open source had become expected for an operating system,  source code availability was a benefit to traditional enterprise users (especially with the advent of DTrace), and the community would attract new users. But then Solaris lost the plot. Users chose Solaris because it is a — or perhaps the — enterprise operating system. OpenSolaris was intended to broaden the appeal, but that notion was taken to such extremes as to lose sight of the traditional customers of Solaris, and, indeed, the focus that makes Solaris both unique and great.

OpenSolaris  June 14, 2005 – August 13, 2010

We launched Solaris 10 in 2004 with an impressive list of features — ZFS, DTrace, Zones, SMF, FMA, Fire Engine — all highly relevant for enterprise users. You can find a company that has bet its business on the success of each of those features. In the wake of OpenSolaris, the decision was made (and here I can no longer use the active voice because by then I had left to start Fishworks elsewhere at Sun) to have an explicit focus on building an operating system for developers — which is to say, for their laptops. This was an error, but a predictable one. Once Solaris was free to download and use, revenue recognition for the Solaris organization which has always been difficult to measure became even more indirect. The metrics were changed: the targets for management bonuses became not revenue, or enterprise users, but downloads. Directly or indirectly much of the focus for the Solaris organization shifted to address that straightforward goal. The mistake was that OpenSolaris didn’t need to find users, they found Solaris. In trying to build a community, the new direction for OpenSolaris weakened the very principles upon which a thriving community would have been based.

The very name “OpenSolaris” got confused, diluted, and poluted. OpenSolaris was a source repository, a community, and a distro (although purists still insist that Indiana is the appropriate name for that part) intended to “close the familiarity gap” with Linux. Moreover, new projects that shifted efforts away from enterprise uses (read: paying customers) to focus on the laptop also rallied under the banner of “OpenSolaris”. In a way Oracle’s acquisition of Sun saved Solaris from itself; the marching orders became much clearer: address enterprise users, ship Solaris 11 (something that should not have taken 6 years). As for OpenSolaris, that decision too was likely simple for Oracle, never an overt fan of open source. Had “OpenSolaris” simply meant a code base and user community, I think there’s a good chance it would have been allowed to live. Burdened, however, with the baggage of the Indiana distro and sundry projects incomprehensible to Oracle management, OpenSolaris was in a politically untenable position. Mike‘s “Friday the 13th memo” merely made it official — Solaris was to be closed source once more.

Sun’s efforts with OpenSolaris  were, at best, a mixed success. Quietly, however, an ecosystem of companies grew out of the technologies in OpenSolaris. Notably Joyent uses Zones and DTrace as significant differentiators; Nexenta builds very heavily on ZFS; as I’ve mentioned, Delphix, my new employer, builds on OpenSolaris as well. There are many more that I know about, and still more that I don’t. These companies chose OpenSolaris so they could use the innovative technologies that simply aren’t available anywhere else. And they did so in spite of a common trend towards Linux with its familiarity, and broad compatibility — the innovation in Solaris was more valuable and, in some cases, enabling for the company’s business.

illumos  August 3, 2010 –

The danger for those companies has long been that Oracle would pull the rug out from under them; only the foolish had no contingency plan. The options were to give up on Solaris or maintain a fork. Happily illumos has stepped in to offer a third path. Garrett D’Amore and Nexenta graciously started the illumos project to carry the OpenSolaris torch. It is an ostensible fork of OpenSolaris (can you fork a dead project?), but more importantly a mechanism by which companies building on those component technologies can pool their resources, amortize their costs, and build a community by and for the downstream users who are investing in those same technologies. Rather than being operated by a single corporate interest, its steward will be a 501(c)(3) non-profit in the model of the Mozilla Foundation.

I was pleased to announce at tonight’s SVOSUG meeting that I’ll be joining the illumos developer council, I was delighted to accept when Garrett offered me the position. My bias for illumos is that the main repository will focus on reliability, performance, and compatibility while taking a conservative approach to new features and functionality. As much as possible, I’d like the downstream users — the distributions, appliances, and platforms — to make the decisions appropriate to their uses and only adopt large-scale changes into the trunk when there’s broad consensus among them. The goal must be to build a project that is readily useful to everyone and to allow our collective efforts to be shared as easily as possible.

What’s the future of Solaris? For many it will be Solaris 11 in late 2011. But for others, it will be illumos either as the firmware for an appliance (not unlike what we built at Fishworks in the 7000 series), the platform for your web applications, or as a general purpose operating system. The innovation in Solaris has always flowed from the creative individuals working on the project. Keep your eyes on illumos; Oracle ending OpenSolaris is no surprise, but in doing so they have broken their own monopoly on Solaris and Solaris talent.

Recent Posts

February 12, 2017
December 18, 2016
August 9, 2016
August 2, 2016

Archives

Archives