Philipp, what do you do as a computer science student?

In the last years at the university there is one thing I really don’t like: Doing stuff and almost nobody will read it and almost nobody will have the possibility to read it. To be fair as a student in computer science there is not too much stuff I could publish so far but sometimes I feel that bummer. I think every student agree with me here. Therefore, I’ve decided to put these things on this blog from time to time. Today I will start with two items.

The first is about the complexity of the equivalence problem for regular expressions. It recalls the absolute beautiful proof of Larry Joseph Stockmeyer that this is PSPACE-complete. Unfortunately, I’ve written it in German, but if you are able to read it, then it is worth it!

The second is more practical. This semester I spent time in a bare bone operating system written in C++. At the end of this module every participant had to do a final project. I wrote an ext2 filesystem implementation for that operating system. Now, we have a generic header-only C++14 ext2 library written in ~2k lines and has ~1k lines of test code. To show the adaptability I did three things:

  • Integrated it into our bare bone operating system
  • Wrote a tool similar to e2tools
  • Wrote a tool which can mount ext2 images on OSX with osxfuse

I can not publish any code of our bare bone operating system but the rest is on github including further implementation details.

Ok with this blog post I am able to create a theoretical case in my mind that someone will perhaps read it. That makes me feel better.


From time to time I stumble into a conversation with someone who really cares about photography. In those moments, I often would like to show some of my photographs to talk about them and to get some feedback. Unfortunately, I do not always take a portfolio with me nor have I published any photographs online and to be honest, I am quite scared about it.

The photographs are taken by analog cameras and I’ve the feeling they should be printed on a well-selected paper. I am not sure if a monitor is the best medium to view my work. But in the end, it is better to show it here and have the opportunity to talk about it.

I’ve already begun to work on some projects but nothing of it is done yet. So, I leave this collection of unrelated photographs below, in the hope that this describes the union of humanity and forms I am trying to accomplish.




Make C++ Smart Pointers A Bit Smarter

Last month, I was back at my company and did some work at scalable TCP-based software written in modern C++. I discovered the allocator will be a problem sooner or later and here I will describe how I made the heap allocations scalable without replacing the given standard allocator. The result is a replacement for std::unique_ptr<> as well as std::shared_ptr<> and might be a general optimization approach for some use cases in multi-threaded software.

The Case

The mentioned software is a HTTP proxy that utilizes Boost.Asio for the I/O operations. The overall idea is that each function call gives the control back to the boost::asio::io_service as fast as possible. I took some overhead to get more scalability. To create functors for example. Nevertheless, the proxy was able to handle a lot bandwidth with only one thread. I was happy with that solution and I didn’t check how much scalability that program really offers.

But sadly, for the upcoming task it’s likely that a few methods are quite busy sometimes. Therefore, I added more threads to the party to channel the CPU load to one thread and distribute the remaining load over the other threads. After that, I profiled the application a bit to discover where the scalability problems are and what I can do about that. It turns out: The software is scaling well enough at the moment and actually I didn’t invest time to optimize it yet. But nevertheless I determine two points.

  1. There are too many heap allocations during the HTTP header parsing.
  2. There are too many heap allocations initialized by smart pointers, e.g. for each incoming connection.

The first point is the major issue, but I looked to the code and the most of that stuff could be done on the stack. Beside, there are more helpful tools available like different allocators and memory pools. So, I did not want to go any further into that.

But the second point got me really thinking. I was a bit impressed how slow memory allocations are and if it is possible to avoid them. Turns out easier said than done because sometimes dynamic behavior is necessary. Memory pools like Boost.Pool themes to be the right answer for that, but this blows the complexity up even more when different threads may work on this objects or the ownership moves from one thread to an another. In my tests memory pools are very useful and performing well when objects are only owned by one thread, otherwise I wouldn’t go with them. So, is there a way to improve smart pointers in scalable software?

Memory Recycling

What if we would collect our memory waste to recycle it? And what if we let the smart pointers do that for us? Before we go to the answer I will give a small introduction what is meant by memory recycling.

In the following lines, you will find an example for memory recycling where the memory space of an object will be reused for another object. There are two important points to notice. First, the objects working with a normal behavior, which means nothing more than the constructor will be called at the beginning of its life time and the destructor will be called at the end. Second, between the lifetimes of theses objects the memory space of the first object will not be given back to the system. So we have two objects but only one memory allocation and one memory deallocation.

First, an object will be created:


To reuse the same address space for another object the destructor of the old object is called before a new constructor will be called.

The important thing is that we force the constructor to create the new object on the same address space. After the destructor for the second object is called, the address space need to be cleaned up.

As you can imagine, this works not only for two objects. It is possible to reuse the same memory space several times. Import is that the objects has the same type or, depends on the implementation, the same size. Bottom line, with this technique, we have only one memory allocation for multiple objects.

Memory Recycling on Smart Pointers

C++14 introduces std::make_unique and std::make_shared as the preferred way to create smart pointers. For the rest of this article it is assumed that this is the only way to create smart pointers in a program. This means the constructors and reset()-function are not used explicitly. In my test I wrote my own versions of theses functions.

Theses functions have the same behavior like the standard ones. Except that they assign the smart pointer to a custom delete function. This delete function destroys the object but does not deallocate the memory to assort the address in a type specific dustbin, which is nothing more than an array of pointers for each type. Next time, when recycle::make_unique or recycle::make_shared is called to be create a new smart pointer of this type the memory in the dustbin for this specific type will be recycled. To avoid a global state in the program, each thread has its own dustbins. Before I will discuss the pitfalls, lets have a look to the best case.


Ok, what we can see here? First of all, you can see that I only tested it up to 4 threads. This is because my current machine has only 4 cores. Of course I tested it with more threads and you can the Hyper-Threading in it and so on but this in not what we want to measure. Apart from that we can discover two major points:

  1. It scales better.
  2. If we have a stable thread environment and each smart pointer will be destroyed from the same thread as it was created then we have a speedup of 2.5x on one.

This first one is what I was looking for. It scales well, not perfect though because the standard allocator was used for test, but a scalable allocator should fix that.
Note: If you take a look to my implementation you can see that the allocator is interchangeable.

As I mentioned before the second one surprised me a bit. If the program has only one allocation for a smart pointer during runtime there would be an overhead, but already a second smart pointer construction of the same type after the first destruction put us in the payback zone.

The best-case scenario has a stable thread environment, e.g. as many threads as cores running all the time and doing the whole job. Then the software will acquire as much memory as it needs for the highest load peak and recycling it afterwards. Thus, the software will have something like a warm-up phase.

In the worst case scenario is that one thread is creating smart pointers all the time and another thread is destroying them. In this case, we have just created overhead. Remember the dustbins are thread bonded. Thus, a thread has to destroy a smart pointer before it can recycle memory. For me that looks like a rare special case, because a thread doesn’t have to destroy the objects, it created itself. A thread has to destroy at least some smart pointers to have memory to recycle. Therefore, it would be totally fine if each thread doing the same quantity of smart pointer constructions as destructions.

Another downside is constant memory overhead for the dustbins. Each type, which is handled by a smart pointer, has its own dustbin and in addition each thread can have its own dustbins for each type. In my implementation, the dustbins live on the stack and have a default size of 128 pointers. So, the maximum overhead is |(types handled by a smart pointer)| * dustbin_size * std::size_t * thread_count. It is possible to reduce this memory overhead by using a queue or stack for dustbins, but this implies more memory allocations. Bottom line, I checked how many different types I am using in my existing projects and it turns out that I can count it on two hands.

At least, we have to discuss the elephant in the room: Why not writing an allocator with this idea in mind? To be honest, this is how I got to this approach, but standard containers like std::vector and std::string are allocating a lot of different sizes of memory chunks. Therefore we would need more dustbins and would have more memory overhead. I would like to say the downsides are just too big on going down that rabbit hole.

You can find the source on GitHub.






Debugging In Vim

A year ago, I have switched to Vim for my C++ development stuff. At the beginning it was hard to manage the new environment but after some uncomfortable weeks I felt deep in love with Vim. Today, I wouldn’t go back and deal with all these stupid cursor operations. But until last night there was a tiny thing that bothered me. No debugging in Vim.

Okay, debugging programs in C++ is a bit different from other programming languages. I would like to say 80% of bug detections could be done by static analysis. The debugger isn’t that important in the most cases. In the last year I used gdb or lldb at the command line and I was pretty satisfied with the results, except with the procedure necessary to set breakpoints. A simple click into the source code would be so much easier. Yesterday, after a long long debugging session, my frustration exploded and I was set to change something. This was harder than expected. So, I decided to write some notes into my fancy and successful blog.

Before we can start I would like to provide some notes about my environment: I’m using MacVim 7.3 on OSX 10.8 and compile all my programs with LLVM/Clang. I have found four opportunities to get debugger integration in Vim: vim-lldbvimgdbclewn and pyclewn. vim-lldb makes use of lldb and therefore it was my first choice, but after some tests I realized this is not the right thing for me. I guess lldb is much better without this vim-lldb shebang. vimgdb, clewn and pyclewn are for the gdb. According to the table on the pyclewn website, surprise surprise, pyclewn provides the most functionalities. Hence, I decided to go with pyclewn.
I found out that pyclewn is not compatible with the preinstalled version of gdb on OSX 10.8. We need to install another version of gdb. This can be done with homebrew.

Now we have two versions of gdb on the system. Apple’s version is located at /usr/bin/gdb and homebrew’s version at /usr/local/bin/gdb. To guarantee that the /usr/local/bin/gdb will be selected by default, we have to change the /etc/paths file.

Make sure that /usr/local/bin is above /usr/bin. In my case I end up with this:

gdb needs to be code-signed to have the privilege to attach to other processes. I have found a good how-to here. Thus, quoted from there:

Start Keychain Access application (/Applications/Utilities/Keychain Open menu /Keychain Access/Certificate Assistant/Create a Certificate… Choose a name (gdb-cert in the example), set Identity Type to Self Signed Root, set Certificate Type to Code Signing and select the Let me override defaults. Click several times on Continue until you get to the Specify a Location For The Certificate screen, then set Keychain to System. […] Finally, using the contextual menu for the certificate, select Get Info, open the Trust item, and set Code Signing to Always Trust. You must quit Keychain Access application in order to use the certificate […].

Now we can sign the new gdb executable:

A reboot was necessary before the code signing worked for me.

Vim starts pyclewn as a standalone program connected to Vim with a netbeans socket. Therefore our Vim version needs the netbeans integration feature. Another dependence is autocmd. Your Vim versions must support theses features but that should already be the case if you are using the preinstalled Vim version on OSX 10.8 and MacVim 7.3 like me. To be sure you should find +autocmd and +netbeans_intg in the version info.

Congratulations! Now we can download pyclewn and install it how it is described on the website by the following commands. I’m not sure but I believe OSX has Python 2.7 preinstalled. Hence the *.py2.tar.gz should be okay.

Finished! Now you can open Vim and start pyclewn with :Pyclewn. If you came across this blog post I suppose that you already know that pyclewn will not turn your Vim into an Eclipse or something like that. It doesn’t shelter you to grapple with gdb’s commands. I would say my environment is still expert friendly, which means I’m not really sure if my frustration level will reduce. No, I’m just kidding. Pyclewn provides the gdb command line directly in Vim and gives the possibility to bind gdb commands to keys and so on. After decades of configuring it will fit perfectly. Hopefully.

Please find my current Vim configuration here.