Jumbo and .Net/Mono

This is the first in a series of articles about Jumbo.

The recently released Jumbo comes in two versions: a modern version, and the original. That original version was written in C#, primarily targeting Mono, supporting the official Microsoft .Net Framework mainly for development and testing.

Because it was dependent on rather old versions of Mono (I never tested it on newer versions), and support for running on Windows was limited, it became harder to run it over time. Which is partially why I eventually decided to port it to .Net Core.

But, why is Jumbo written in .Net anyway? Hadoop is written in Java, so if I wanted to learn more about Hadoop, wouldn't it make sense for me to use Java too? Perhaps, but I wasn't super familiar with Java and its tooling, and I also felt it would steer me too much to be exactly like Hadoop, which I didn't want either.

So, I felt I had basically two choices given my skills at the time: C++ or .Net, both of which I was very proficient in.

C++ would've given me performance. But, it also would've increased the amount of work. I knew there were a bunch of things I would need to solve that aren't straight-forward with C++: reading configuration, RPC across a network, dynamically loading code to run jobs, text processing, robust networking, and more. All of those would require me to either find libraries, or roll my own. Possible, but time consuming.

Also keep in mind that this was 2008. C++11 was still called C++0x and not finished yet, with basically no compiler support. A lot of the niceties of modern C++ weren't there yet. And the ecosystem for finding, building, and using libraries in C++ wasn't exactly friendly either, especially if you wanted to work cross-platform.

And I did want to be cross-platform: I had to run this thing on university-owned clusters, which ran Linux, but I was doing my development on a Windows machine. And this was long before WSL made such a thing easy to do; my best options at the time were VMs or Cygwin. The idea of using Visual Studio for Linux development would've sounded ludicrous at the time, and VSCode wasn't even a flicker in anyone's imagination yet.

.Net would be slower, but, I reasoned, probably not any worse than Java. And running it on Linux would be possible with Mono, I believed. It would give me most of what I needed for free (like remoting and reflection), and I was kind of interested to see how some of .Net's features over Java (such as real generics) would alter the design.

My next choice would've been to use Rust, but unfortunately that hadn't been invented yet.

Yeah, I guess part of the point of this post is that this whole endeavour would've been much easier with modern tools. WSL, .Net Core, VSCode, maybe even Rust... it would've been interesting, to be sure. I also used Subversion for source control; git was around, but not so prevalent yet. Only much later did I move the repository over to git.

Anyway, given my choices at the time, I picked .Net. I think it was a good choice, as it allowed me to develop something working reasonably quickly. It also brought its challenges, in particular when dealing with Mono.

At first, I just started in Visual Studio and tried to get some basic DFS parts running before worrying too much about Linux. When I did try to run in on Linux with Mono, it worked surprisingly well.

Building it on Mono was another matter, however. I'm not 100% sure, but I seem to recall that originally, the Mono C# compiler was lacking some features that prevented it. Even after it became possible, I had to keep a separate build system; the Visual Studio (MSBuild) project files were used on Windows, and for Mono I settled on NAnt. I had to keep the two in sync manually. Before that, I would just build on Windows, and run those binaries on Linux.

Mono also limited me in what features I could use. .Net Framework 4.0 was in beta at the time, but I couldn't adopt any fancy new features (like LINQ) until those were supported on Mono, which took a while. Until then, I was basically stuck with mostly .Net Framework 2.0, as most of the 3.x features weren't super relevant for what I was doing.

One of the biggest hurdles with Mono turned out to be Garbage Collection, as Mono's GC at the time was much less advanced than the .Net Framework (and presumably Java's) one. Mono's GC was non-generational, and used a stop-the-world approach (all threads are paused during collection). It did eventually get a better GC, called sgen, but it wasn't stable in time for me to benefit. Meanwhile, the old GC was often taking more than 10% of task execution times, which necessitated the development of record reuse.

Another big challenge was some problems with .Net Remoting, which I'll cover separately in a future post.

Still, Mono was what made this whole thing possible. Without it, I wouldn't have been able to use .Net at all, and any other choice for me at the time would've made development more complicated, slower, and probably less fun. Big thanks to Miguel de Icaza and others who contributed to Mono.

Categories: Software, Programming
Posted on: 2022-10-08 20:35 UTC.

Comments

No comments here...

Add comment

Comments are closed for this post. Sorry.