Posts for category 'Software'

Jumbo and .Net/Mono

This is the first in a series of articles about Jumbo.

The recently released Jumbo comes in two versions: a modern version, and the original. That original version was written in C#, primarily targeting Mono, supporting the official Microsoft .Net Framework mainly for development and testing.

Because it was dependent on rather old versions of Mono (I never tested it on newer versions), and support for running on Windows was limited, it became harder to run it over time. Which is partially why I eventually decided to port it to .Net Core.

But, why is Jumbo written in .Net anyway? Hadoop is written in Java, so if I wanted to learn more about Hadoop, wouldn't it make sense for me to use Java too? Perhaps, but I wasn't super familiar with Java and its tooling, and I also felt it would steer me too much to be exactly like Hadoop, which I didn't want either.

So, I felt I had basically two choices given my skills at the time: C++ or .Net, both of which I was very proficient in.

C++ would've given me performance. But, it also would've increased the amount of work. I knew there were a bunch of things I would need to solve that aren't straight-forward with C++: reading configuration, RPC across a network, dynamically loading code to run jobs, text processing, robust networking, and more. All of those would require me to either find libraries, or roll my own. Possible, but time consuming.

Also keep in mind that this was 2008. C++11 was still called C++0x and not finished yet, with basically no compiler support. A lot of the niceties of modern C++ weren't there yet. And the ecosystem for finding, building, and using libraries in C++ wasn't exactly friendly either, especially if you wanted to work cross-platform.

And I did want to be cross-platform: I had to run this thing on university-owned clusters, which ran Linux, but I was doing my development on a Windows machine. And this was long before WSL made such a thing easy to do; my best options at the time were VMs or Cygwin. The idea of using Visual Studio for Linux development would've sounded ludicrous at the time, and VSCode wasn't even a flicker in anyone's imagination yet.

.Net would be slower, but, I reasoned, probably not any worse than Java. And running it on Linux would be possible with Mono, I believed. It would give me most of what I needed for free (like remoting and reflection), and I was kind of interested to see how some of .Net's features over Java (such as real generics) would alter the design.

My next choice would've been to use Rust, but unfortunately that hadn't been invented yet.

Yeah, I guess part of the point of this post is that this whole endeavour would've been much easier with modern tools. WSL, .Net Core, VSCode, maybe even Rust... it would've been interesting, to be sure. I also used Subversion for source control; git was around, but not so prevalent yet. Only much later did I move the repository over to git.

Anyway, given my choices at the time, I picked .Net. I think it was a good choice, as it allowed me to develop something working reasonably quickly. It also brought its challenges, in particular when dealing with Mono.

At first, I just started in Visual Studio and tried to get some basic DFS parts running before worrying too much about Linux. When I did try to run in on Linux with Mono, it worked surprisingly well.

Building it on Mono was another matter, however. I'm not 100% sure, but I seem to recall that originally, the Mono C# compiler was lacking some features that prevented it. Even after it became possible, I had to keep a separate build system; the Visual Studio (MSBuild) project files were used on Windows, and for Mono I settled on NAnt. I had to keep the two in sync manually. Before that, I would just build on Windows, and run those binaries on Linux.

Mono also limited me in what features I could use. .Net Framework 4.0 was in beta at the time, but I couldn't adopt any fancy new features (like LINQ) until those were supported on Mono, which took a while. Until then, I was basically stuck with mostly .Net Framework 2.0, as most of the 3.x features weren't super relevant for what I was doing.

One of the biggest hurdles with Mono turned out to be Garbage Collection, as Mono's GC at the time was much less advanced than the .Net Framework (and presumably Java's) one. Mono's GC was non-generational, and used a stop-the-world approach (all threads are paused during collection). It did eventually get a better GC, called sgen, but it wasn't stable in time for me to benefit. Meanwhile, the old GC was often taking more than 10% of task execution times, which necessitated the development of record reuse.

Another big challenge was some problems with .Net Remoting, which I'll cover separately in a future post.

Still, Mono was what made this whole thing possible. Without it, I wouldn't have been able to use .Net at all, and any other choice for me at the time would've made development more complicated, slower, and probably less fun. Big thanks to Miguel de Icaza and others who contributed to Mono.

Categories: Software, Programming
Posted on: 2022-10-08 20:35 UTC. Show comments (0)

Jumbo

Today, I'm releasing something that I've wanted to release for a very long time. It's a project that I worked on during my Ph.D., and while I don't think it'll be terribly useful to anyone, a lot of work went into it that I want to preserve, even if just for myself.

That project is Jumbo, and it's now availabe on GitHub in two flavors: Jumbo for .Net 6+, and the original for .Net Framework and Mono. If you want to play around with it or learn more about it, you probably want the former.

Jumbo is an experimental large-scale distributed data processing system, inspired by MapReduce and in particular Hadoop 1.0. Jumbo was created as a way for me to learn about these systems, and should be treated as such. It's not production quality code, and you probably shouldn't entrust important data to it.

Basically, back when I was getting started with my Ph.D. in 2008, I found myself staring at the code of Hadoop (which wasn't even at version 1.0 yet at the time), and finding I wasn't really getting a good feel of how the whole thing fit together, and what really goes into designing a system like that.

So, some people at my lab suggested I should try building something for myself, which I did. I built, from the ground up, a distributed file system and data processing system, which is Jumbo. It was heavily inspired by Hadoop, and definitely borrows from its design (although no actual code was borrowed). In some aspects, I deviate from Hadoop quite a lot (especially since Jumbo isn't constrained to only using MapReduce).

Building Jumbo taught me a lot: about software design, about distributed processing, about decisions that affect scalability, and more. It's my hope that maybe, someone else interested in these topics might want to look at it and find what I did interesting. If nothing else, I just want to preserve this massive project that I did (still the biggest project I've done where I'm the sole contributor), and have its history available.

I did end up using Jumbo for some research efforts, which you can read about in a few papers as well as my dissertation under the University section of my site.

Jumbo is also the origin of one of my most widely used libraries, Ookii.CommandLine, so it's significant in that respect as well.

Like I said, I've wanted to release Jumbo for a long time. If you look through the original project's commit history you can see a bunch of work done in early 2013 (as I was nearing the end of my Ph.D.) like cleaning stuff up and adding documentation, but I never quite reached a level where I was comfortable doing so. The project, which primarily targeted Mono to run on Linux, wasn't that easy to set up and run.

In 2019, I ported the project to .Net Core, just to see if I could. That version was easier to play around with, and I wanted to release it then too, but I never quite got around to finishing it, until now.

So now, you can look at Jumbo and play around with it on .Net 6+, thanks to this new version. I've also expanded the documentation significantly, so it should be easy to get started and to learn more about how it works. The original Jumbo project for Mono and .Net Framework is only provided to preserve the original history of the project (the new repository only contains the history of the port). You probably shouldn't try and run it (though I obviously can't stop you).

If you want to comment on Jumbo or ask any questions, please use the discussions page on GitHub.

Additional posts about Jumbo

Categories: University, Software, Programming
Posted on: 2022-09-20 23:54 UTC. Show comments (0)

Ookii.FormatC 2.3

After I recently released an updated version of Ookii.CommandLine, I figured Ookii.FormatC could also use some love.

This version comes with an optional new dark mode stylesheet, nullable reference types enabled for the library, the ability to write directly to a TextWriter, C# 10.0 keyword support, and a few minor other features and fixes.

Thanks to the ability to write to a TextWriter, you can now do stuff like this:

var formatter = new CodeFormatter()
{
    FormattingInfo = new CSharpFormattingInfo()
};

formatter.FormatCode(SampleCode, Console.Out);

Okay, writing HTML to the console is maybe not the most useful example, but you get the idea.

You can try it out with .Net Fiddle, or look at a sample that also shows the new dark mode in action. The online syntax highlighter has also been updated, and now supports PSParser based PowerShell formatting again.

And yeah, the NuGet package is version 2.3.1, rather than 2.3.0. That's because somehow the package for 2.3.0 ended up with an outdated binary in it. Not sure how that's possible, but it happened.

Categories: Software, Programming
Posted on: 2022-09-14 05:40 UTC. Show comments (0)

Ookii.CommandLine 2.4

I've released an update to Ookii.CommandLine, my library for parsing command line arguments for .Net.

This new version comes with nullable reference type support (for .Net 6+), a new helper to make parsing easier, more customizability, an easier way to make -Help style arguments, and some bug fixes.

See the full list of changes here.

With the new helper method, you can now just do the following to parse the arguments and write errors and usage to the console if parsing failed:

var parsed = CommandLineParser.Parse<MyArguments>(args);

And if you want to customize parsing behavior, you can still do so with this method:

var options = new ParseOptions()
{
    NameValueSeparator = '='
};

var parsed = CommandLineParser.Parse<MyArguments>(args, options);

Of course, existing code to parse arguments that manually creates an instance of CommandLineParser will continue to work.

Check it out on NuGet or GitHub, or try it out online!

Also, the Visual Studio code snippets (which previously required manual installation) are now available on the Visual Studio marketplace.

Categories: Software, Programming
Posted on: 2022-09-06 03:05 UTC. Show comments (0)

.Net Configuration Section Documentation Generator

Today I'm releasing another small tool that I created for personal use that I believe others might find of use. The tool is the .Net Configuration Section Documentation Generator (more proof that I should not be put in charge of naming things), which does what it says on the tin: it generates documentation for configuration sections defined using .Net's ConfigurationSection class.

Basically, it can generate an XSD schema from a ConfigurationSection, to which you can then add annotations to document the elements and attributes of the section, and then generate an HTML documentation file from that schema. The tool can be used with Microsoft .Net and Mono.

As an example, check out the schema for the documentation generator's own configuration section as generated by the documentation generator, and with added annotations. And finally, the documentation file generated from the annotated schema. This is only a very trivial configuration section, but it gives you an idea of what the output of the documentation generator looks like.

More information and downloads here.

Categories: Software, Programming
Posted on: 2013-06-08 10:42 UTC. Show comments (1)

Latest posts

Categories

Archive

Syndication

RSS Subscribe

;