Slow Frog: 2011

Thursday, November 24, 2011

New sticker-art sightings

This time it's Pokemon Mienfu (Kungfouine in French) and Muttley (aka Diabolo) that have been stickerized.

Mienfu is a standard conversion of an official bitmap.

For Muttley, on the other hand, I doctored the original image and used an hexagonal grid to obtain a "crisp" result, with 7,200 stickers.

He's now on my office's window, making fun of the neighbors who are still doing lo-def drawings with Post-It notes.

Friday, November 18, 2011

The making of Web Desktop Dungeons

As promised in the previous post, I will explain how I made the web version of Desktop Dungeons. I already told why I embarked on that project; here is the how.

When I first got the idea to port the game to the browser, I looked at what kind of game libraries were available. Back in April, nothing seemed to fit what I was looking for. Lots of libraries and frameworks, both free and proprietary, have appeared or blossomed only last summer. And most of the ones I browsed focused on fast-paced shooters and platformers, with animations, collision detection, particle systems, physics simulation and other features that were clearly not needed for Desktop Dungeons. So I quickly decided that I would code everything myself. It did not seem too big too handle.

The other thing that I decided immediately at start (I don't even know if I should call it a decision) was to use only HTML and CSS for rendering: no canvas drawing. I thought that I would be able to ensure compatibility with older browsers that way. And since the game is tile-based, it was a very natural choice. I did not need real drawing capabilities. I only wanted to show images at predefined positions, rectangular buttons and areas, all things possible with <img>s, <div>s, <a>s and a bit of CSS styling. I abandoned compatibility with older browsers (yes, IE 7 and 8, I'm talking of you) along the way, but for other reasons.

The early days

On the first day, I was able to display the 20x20 board with a predefined maze, a yellow placeholder for all the status information, a hero sprite that I could control with the keyboard and move even through walls, and a mouse-following 'cursor'. And all that with one HTML file, two JavaScript (one for utilities/compatibility layer and one for the game itself) and all the bitmaps from the original game. I used git immediately to track my changes and backup my work. I made five revisions on that first day.

Web Desktop Dungeons at one day

After three weeks of (after hours) work, the game started to look a bit better: the info panel at the right was a bit more fleshed out, moving the hero worked both with the keyboard and the mouse, thanks to the pathfinding algorithm, the maze (still fixed) was filled with lots of intersting stuff and the combat system was in place, somewhat. Except it was not yet possible to die. The JavaScript code had now expanded to four files, totaling 1.200 lines. Three refactorings had already taken place, one of them labelled as "big" in the change log.

Web Desktop Dungeons at three weeks

Then, I reorganized the code in true MVC fashion, began to develop individual widgets (buttons, gauges, images, tooltips, dialogs), continued to improve support of game mechanics (magic spells, special states, regeneration, power ups), added the basic logic of the menu screen and mid-June, I contacted QCF Design to present my creation. Danny (Day aka Dislekcia) answered me and has always been very positive and supportive. At that time, I wanted to open-source the whole thing and publish it on Github, with permission. Danny convinced me it was not a good idea: the project was still far from finished or even playable. Had I gone public at that time, I would probably have received only negative reactions.

So I resumed work on implementing all the "content": hero classes and races with their unique abilities, monsters, bosses, gods (oh! boy!) and items. I was guided by the content of the wiki and by personal experiments with the desktop version. I contributed a few elements back to the wiki. But after three months, I began to realize that play-testing was not going to be enough to ensure correct implementation of all the details and prevent regressions. I was now at 4.000 lines of JavaScript and I needed an automated test system. Otherwise, it was impossible to manually reproduce rare situations, like finding an altar to Jehora Jeheyu while being immune to poison.

Growing up, gathering tools

I chose JsTestDriver as my unit testing library. It's mature. It can be added to an automated build. It runs the tests in real browsers, not in some kind of virtual JavaScript environment. It can drive several browsers at once. It produces reports that are compatible with JUnit. Sometimes the server gets stuck and needs a restart, but that's really bearable. I started with version 1.3.2 and later migrated to 1.3.3 when it became available. I chose to only write unit tests with it for the game mechanics. I did not invest in automated UI testing. I thought the return on investment for those kinds of tests was not worth it. Breaking something at the UI level would become obvious quickly with some play tests: that could not be said of subtle game logic regression. In hindsight, it was a good choice.

The test suite started with 75 lines of JavaScript in July. At the begginning of August, the code base was 5.600 lines and the test suite more that 600, mainly focusing on hero classes and beasts. In September, 7.800 lines of game code, 1.700 lines of test code, more than half for all the exquisitely detailed behaviour of divine entities.

Yeah! Tests ok across five browsers!

Mid-August I got interested in JavaScript minification or compilation. I found two possible tools: Doug Crockford's jsmin minifier and Google closure compiler. closure being written in Java was a better fit for my environment and promised more than just minification, with its advanced compilation mode. However, trying to apply it to the project at that time, with several thousands lines of existing code proved too hard. The advanced compiler made wrong assumptions that broke the game and needed me to explain everything and possibly apply deep refactorings, ultimately ruining some of my JavaScript techniques. I gave up on the advanced compilation mode and kept only the basic one. It's still very useful for two reasons. The first is that it packs all the JavaScript code into one single file. It means I can structure my code any way I want, for easy development and still end up with one compact file, better for deployment. The second advantage of compilation is size reduction: closure manages to shave off a good third of the initial size with whitespace and comment elimination and local variable reduction.

On my next project, still in a very early phase, I used closure from the start, and took precautions to keep advanced compilation working. I don't know if it enhances runtime performance, but size reduction is still more impressive. My 50kB of JavaScript become 22kB with standard compilation and a mere 13kB with advanced mode. After all, maybe it's only useless optimization, when the complete JavaScript code is smaller than the image for the splash screen...

Original sources and HTML5

In early September, probably bored at answering my nitpicking questions, QCF Design agreed to send me the GameMaker project of the original game. I discovered both the tool and the game source with mixed reactions. First, GameMaker is primarily targeted at hobbyists and not at programmers or software engineers. Versioning source code is impossible up to version 8.1. Code is scattered all around the place: sometimes as real code, sometimes as graphical constructions. The design screens have somewhat inconsistent conventions of interaction. From a software engineer point of view, it's ugly.

But after doing the tutorials, I began to appreciate its philosophy. I am now convinced that it's a great tool for either hobbyists or rapid prototyping. In each case, it makes it possible to focus on the game itself and not on technical issues (How do I display an animation? How do I play a sound? How do I code collision detection?) It's also probably a great way to introduce children to programming. It's easy and quick to have something working. Afterwards, it's open to experiment.

Is that code??

Overall, the game source became my source of answers, and I shamelessly copied the maze generation algorithm.

During the summer, I added a few things that required HTML5 features and definitively lost any hope of supporting older browsers. The first one is anecdotal. I need to use a <canvas> element behind the scenes in Chrome and Opera to scale up bitmaps and keep the pixellated look. The second one is central to the game: saving progress using localStorage. You could argue that it would be better if progress was saved on the server side. But that was not the point. I wanted to provide saved games on the client side, to allow deploying the game anywhere, without server support over simple HTTP transfer, much like what Flash games do. The third HTML5 feature has been added later: sound support. It was a bit tricky (different browsers support different encodings, Safari requires QuickTime before it can play anything, Firefox 3.6 requires <audio> tags to be part of the DOM tree) but otherwise quiet easy to add.

Finishing touches

That's definitely the hard part. At least for me. Most of the interesting bits of code were done. Most of the tools had lost their shine. What helped me was my moral commitment towards Danny and QCF Design. Some psychological studies - yes, I read some of them - suggest that great achievers always have a mental picture of the goal they are targetting and focus on it, whereas under-achievers keep looking in the past at all the work they have already done... and generally stop their effort before finishing, when they consider they have done a lot. I am often guilty of that sort of behaviour. That time at least I finished. The fact that it was a port of an existing game, with a clearly defined goal that I could use to estimate the path left must have played a role in helping me succeed. I must keep that in mind for my future projects.

On the 2nd of October, after exactly six months of work, 9.800 lines of game code and 3.100 lines of unit tests, I told Danny that I was ready for deployment. It still took about a month to add a splash screen to monitor the loading progress over slow connections and iron out a few bugs. And on the 5th of November the game was finally uploaded to the official Desktop Dungeons web site and announced in the news. I took two weeks of vacations in the Bahamas to celebrate and write this account of the complete journey ;)

Bonus

I made a visualization of the development history, if you're curious. It represents the tree structure of the project at each commit. Each type of file has a distinct color. The size of the file is show as the width of the segment (at logarithmic scale). Enjoy! And don't forget to play the game.

The final project structure

Sunday, November 6, 2011

Desktop Dungeons in your browser!

Here it is! My HTML port of Desktop Dungeons alpha is done and available on the site of the original Desktop Dungeons. It's free, fully playable, up to the Lothlorien campaign, and works on all recent browsers (Chrome, Firefox 3.6 & over, Safari, Opera and even IE9).

Desktop Dungeons on the web

I started working on porting the free alpha version in April. At that time, only that version was available. QCF Design, the creators, were busy working on their own port (to Unity3D) that they quickly showed during E3. They then published the new version as private beta, open to people pre-ordering the full game. Of course, their goal is to make a polished game, exploiting all the possibilities of the design, that can be played on a large array of devices. The alpha was done with GameMaker and could only be run on Windows and MacOSX. And that's why I did the port: I wanted to make it available to a larger public (for instance Linux users).

After a few months of after-work coding, I showed my progress to the nice people of QCF Design, and they agreed to send me the GameMaker code, so that I could make the port as faithful as possible. And they even agreed to deploy it on their site once finished. How cool is that!

I did all the coding with JavaScript and mostly HTML4 (using DOM for graphics), but a few features need HTML5: audio support and localstorage to save your progress across levels and classes. For the record, the whole game is about 10k lines of JavaScript and took me six months to code. It might seem long for a simple game like that, but as I said, I could only put a few hours in it now and then. And the mechanics and contents are quite rich: almost twenty hero classes with strange powers, ten gods with different tempers, ten spells and lots of possible items available in shops.

Go play it and tell me or the forums what you think!

In a future post, I'll cover my development tools, just in case someone is interested.

Friday, November 4, 2011

Cool kids are doing it

Yes, seven and nine year old kids can do really nice pieces. The size must be carefully selected: not too small for a nice looking drawing, not too big, so that it can be done in an hour or two. And here are the results:

The cute Pichu

The cunning Riolu

Wednesday, October 26, 2011

Does it scale?

After my small experiment with the Firefox logo (about 800 stickers), I wanted to see if the technique could scale. The first step was to order a bunch of 25,000 stickers, because office supply was too short for my ambitions. I got all those packs for about 75 euros, VAT and shipping included, which amounts to 0.003 euros per sticker.

Then I searched for an appropriate image to render. I wanted something that made good use of the five basic colors I had (red, green, blue, yellow and black). I settled on an old image of Lucky Lucke, ready to draw his guns. It's easily recognizable (at least in France), has nice primary colors and is quite big and thin.

Using my JS tool, I made the template I needed, and computed that it would require just under 5,000 stickers and would be 1 meter wide and 1.6 meter high. Almost life-size. It's a big step forward from my previous creation, but it still felt manageable. I also prepared a "grid" at the correct scale to put under the transparent film, to help me place the stickers at their right positions. Since my roll of film is only 50 cm wide, I had to split the drawing in two.

And I started the sticking phase. All in all, it probably took me about ten hours to complete, and I was quite happy with the result. Here it is, hung on a door.

Lucky Luke on door

After that, I rolled it up and brought it to my workplace, and put it on a window for the whole world to see. Or at least for the whole parking lot. You can see that our windows are a bit small: the boots have been lost. And you can also see that I have a pretty bad camera...

Lucky Luke in situ

With the blinds closed, it looks ok. Without, the background looks black and the parts that should appear white (the hat and the hems of the trousers, the cigarette) don't look good, because I have no white stickers. A possible solution is to stick any kind of stickers on the other side of the transparent film: their back is white! I don't know yet if I will try to fix Lucky Luke or not. The solution with the blinds is working nicely and I don't want to waste more stickers. I'd rather keep them for my next opus...

Wednesday, October 12, 2011

Real world sticker art

Here it is! The first real-world implementation of my sticker art concept. It's quite a bit more work to do than Post-It™ art, but the resolution is better.

Firefox logo with stickers on white background

Those stickers being really sticky, I did not want to put them directly on a window or on office furniture. So I chose to use transparent book cover film, the kind used to protect the books and notebooks of children. It's cheap, readily available, and I was able put a grid model underneath to help me not deviate too much. I'll see tomorrow if I can find a place for it in my office.

This relatively small piece used the standard 16x16 pixel Firefox logo as a model. I did the "polychromation" with white background, using the 7-color palette. It needed a total of 788 stickers: about one and a half pack of multi-colored stickers (8 sheets of 11 stickers of the 7 different colors per pack). I did not use a stopwatch, but I would say it took me between one and two hours of meticulous work to finish.

In fact, it's only after doing it and trying to gauge the result on different places that I added the "background" option to the application. I realized that the result looked quite different when viewed on a dark background than on a light background. From the start, I had assumed that the missing dots would represent white, but it's not always the case. If you put the piece on a window and look at it from the outside, everything around the stickers will probably look darker (at least in the daylight). On the contrary, looking at it from the inside, the outdoor will make the background appear light. On the Firefox logo, it does not make a big difference, but on other images, with large white or black regions, it can change everything.

Firefox logo with stickers on black background

You might also have noticed that I added a "Sticker unit price" input box. This will help you estimate your cost, in case you want an quote beforehand. I found a provider in France that sells monochromatic packs of 462 stickers for 1.20€, VAT excluded, when buying more than five packs. That makes the VAT-included unit price 0.0031 €. So my Firefox logo cost a bit less than 2.50€ in stickers.

That's all for the cold hard facts. I'm still pondering what I'll do next. If you want to join the new office-supply-abuse-revolution, you're all welcome!

Tuesday, October 4, 2011

Improvements on the sticker art creator

The previous version of polychrome, my nifty tool for wasting your company's stickers, only supported a limited set of embedded images. And even though those images had been carefully chosen, there comes a time when you want to experiment with more personal pictures. Enter the new version: you can now upload your images and have them converted.

You might have noticed that I emphasized the word upload. That's because when you do that, you don't really upload anything anywhere. I'm just using that particular interface to allow the browser to get access to a local file that you choose. If you don't trust me, just disconnect from the network after loading the page: the application will run unaffected.

Also, you will now see the number of stickers that will be needed for your creation (lots of them) and you can automatically apply a zoom to shrink your image to more realistic dimensions.

Oh, and sorry Internet Explorer and Safari users, but the File API is not supported by your favorite browser: you will still be limited to the selection of embedded images. Everything works fine with Chrome, Firefox and Opera on Windows. If you have tested other combinations, either working or not, just let me know.

Technical details

The reason I used the upload/FileReader API from HTML5 is simple. I first thought that just allowing the user to type in a URL would be enough. The application would load the image into the page, draw it, do some computation, and display a nice sticker chart. Except that does not work, because of security limitations. The "Same Origin" rule makes that impossible. When you try it, you end up with an image that you can display, but JavaScript cannot get access to its individual pixels and cannot do anything with them. It's essentially a write-only image. Useless.

So I crawled the web, looking for an online HTML5 Photoshop-like application where it was possible to upload images, willing to learn how that magic was possible. I finally found http://evanw.github.com/webgl-filter/. It's basically a WebGL effects demo, but what interested me was the upload function. It turned out it was quite easy. Just an old <input type="file">, then a FileReader plugged on the selected file, and with readAsDataURL() and putting the result as the source of my image, everything worked perfectly.

All in all, this small experiment taught me a few new tricks of HTML5. It's really becoming a nice platform for all sorts of applications.

If you are interested in code, have a look at the source on github.

Friday, September 23, 2011

Post-it art is sooo old!

Why not waste your company's money on stickers instead. You know, the small round ones universally used by any self-respecting "agile" team:

Stickers (gommettes in French) about to be misused

Well, now it's easy! Just go to polychrome and try the options. The application is a small and quick hack and does not look like much for now, but it will improve. The next things to be built are image upload, and parallel processing using webworkers.

In the meantime, you can still display a few sample models, ready to be used.

Model for the Firefox logo made from 7-color stickers

Monday, August 15, 2011

Grunt work going on

For the last four months, I've been using almost all my free time writing a web port of a free game. That's why I did not post any new inflammatory benchmark or computer-aided game solution.

It might seem strange, but the game I'm porting, whose name I will not reveal yet, is only available as a Windows or OSX binary. I like the game very much and wanted to share it with some of my Linuxy friends, so I thought: "Rebuilding the game based on HTML/JS should not be that hard, and then, everybody with an Internet access and a modern browser could enjoy the game, without OS dependency or installation."

I'm making good progress on the game itself. It's not a graphics-heavy game, but more of a content-rich game and although the main game mechanics have been in place for a good time now, I'm still adding all the content, and that takes time. It doubly takes time, because I have to do research on the original game, to discover what needs to be implemented, and after that, I have to replicate it into the new game.

So what's the game? Sorry, but I will not reveal it just now. I have been in touch with the original developers because it's a free game, but not an open source one. I don't want to steal anything from them, and I hope we can reach some kind of agreement. They are currently trying to transform the original free game into a commercial one, far richer both in terms of graphics and content. I only want to replicate the free one, without killing their sales. Maybe it could even help their sales: like the current free version, it acts as a free appetizer.

Sunday, May 29, 2011

Web Workers faster than (naive) C++ [edited]

[EDIT] The main problem with the C++ version was probably the platform (Cygwin). So I re-ran the same program on Linux inside VirtualBox on the same machine and the results are not so catastrophic. I also ran the Python program with PyPy and it's twice as fast as CPython on that problem (Thanks to Alex Gaynor for trying it first).

I don't mean that to stir the controversy. If someone with great C++ skills could explain what I did wrong, I'd be glad to learn. So here is the story.

Multi-language parallel processing comparisons

It all began last week with Google Code Jam problem B in round 1A. The problem is the following. Two guys are playing hangman. They have a common dictionary of allowed words. The guy who tries to guess has a very simple strategy. He has a fixed list of letters that he will propose in order. However, he will skip letters that cannot appear in the word in the current state of the game. Each time he proposes a letter that does not appear in the word, the other guy makes one point. Your goal is to find the word that will make the most points, knowing the dictionary and the ordered list of letters of the guy who guesses.

I have written a solution with the same algorithm in four languages: Python, Java, C++ and JavaScript. I also wanted to get some knowledge of the parallel processing capabilities of those languages. In Python, I used the multiprocessing module, in Java, package java.util.concurrent with ExecutorService, in C++, OpenMP and in JavaScript HTML5 web workers. Here are the running times.

	Single	1 worker	4 workers	6 workers
	Single	Multi
Java	26"	27"	18.5"	17.5"
C++ Linux	33"	33"	20"	21"
C++ Cygwin	58"	59"	30'00"	28'00"
PyPy	1'04"	1'08"	27"	26.5"
Python	2'30"	2'35"	50"	44"

JS Chrome	44"	45"	15.3"	14"
JS Opera	1'17"	1'17"	1'21"	1'35"
JS IE	1'30"	No web workers
JS Firefox	1'34"	crash 1'14" for 9	crash	crash

Best times are highlighted in green in each column, worst times in red, crashing or unavailable solutions in black.

The code is available in github, with extensive explanations on how to build and run it. The HTML5 web workers solution can be tested right in your browser. To run the program in non-web worker mode, just click the "Run without workers" button. Be aware that the browser will probably tell you every few seconds that a script is making the page unresponsive. You need to allow it to continue if you want it to finish.

To run the program with web workers, you need to create a pool first. When you do that, a row of green squares will appear, to track the activity of allocated workers. Then, launch the computation with "Run". The "Queue size" will move briefly to 10, workers will turn red when busy. The result will appear once the queue is empty and all workers have finished.

Observations

About the C++ program

My first surprise is that I could not make my C++ program run as fast as the Java solution. My C++ skills are rusty, for sure, but when I wrote some numerical things not too long ago, I had no problem making them go two or three times faster than Java. This time, there is a lot of string and map handling and I guess my performance problems might come from too much memory allocation. I used gprof to locate potential hotspots, but nothing meaningful appears. I would genuinely be interested if some C++ wizard could explain how to fix that. [At least on Linux it's better than with Cygwin]

C++ 4-workers CPU profile

The second suprise is that using parallel processing actually made things worse, much much worse! Read the numbers: 30 minutes for the 4-workers C++ when Chrome takes 15 seconds. Actually it reinforces the hypothesis that my performance problems come from memory allocation. The multiple threads of execution are fighting each other for memory and provoke contention. That would explain the fact that the program only consumes 25% CPU, a big part of which is system time. [Yet again, on Linux the multi-threaded version sees a good increase in performance. Cygwin probably has to do much more work.]

About Firefox 4

The sequential solution, using only the main JavaScript thread, has a very curious memory profile.

Firefox non-worker solution

It looks like memory is never reclaimed during the execution, only after the function has finished running.

When running the 4-worker solution, Firefox CPU profile is similar to the C++ one: only 40% used and at least three quarters of that is system time. Using Process Explorer, I would conclude that Firefox uses threads for workers and probably meets memory contention too.

What is more troublesome is that Firefox cannot reliably compute the solution using web workers. Firebug catches some exceptions during execution. But those exceptions never occur in other browsers, or even in Firefox when running without workers. Among the ten tasks that workers have to process, there is generally one or two that produce an exception. It is sometimes possible to solve nine of the ten cases with one worker, but not always. And with four workers, it cannot even finish four cases. I have really no idea what is wrong.

About Opera 11

Opera performs quite well without workers. With workers, it performs the same. In fact, the more workers you add, the worse it performs. I guess that it uses "green threads", not native ones. No new threads appear in Process Explorer when launching workers and CPU use is always around 25%. With more workers, there is more work for the internal scheduler and some more seralization of data to do. That's what explains the slight degradation of running times.

About Internet Explorer 9

IE9 does not support web workers, so I only tested the standard solution. The performance is reasonable. It is worse than Chrome and Opera, but a bit better than Firefox.

About Chrome 13

Chrome is impressive, no doubt about that. The standard JavaScript solution is already the fastest of the browser pack (twice faster than Firefox and IE), and it benefits greatly from added workers. Using separate OS processes (like Python multiprocessing module) makes it immune to memory contention.

About Java

Java is a still a bit faster than Chrome V8 JavaScript in single-threaded mode. However, the gain when using multiple workers is not as big as expected: only 1.5 times faster with four workers when Python or Chrome go three times faster with four processes. I don't know why this is so and I don't have a precise idea how to diagnose. It might be a kind of memory contention, as with other multi-threaded implementations, except it does not appear as system time, because the JVM manages memory by itself.

About Python

I initially wrote my solution in Python, because the language feels so good. When ran with PyPy (1.5.0-alpha0) it can perform twice as fast as with CPython. That's great news!

Conclusions and open questions

I welcome all comments, on why my C++ solution is so slow, why Firefox fails with workers, and anything else. In the meantime, I will finish and release my micro-library to provide a Python-like Pool API above web workers, knowing that today, only Chrome can use it for heavy lifting.

Thursday, May 26, 2011

Taking advantage of multiple cores with JavaScript web workers

I'm at it again!

This time, I wanted to test the performance of JavaScript for solving the same problem I already described previously. And I chose to jump at the chance to perfect my knowledge of HTML5 web workers.

My main resource for learning how to use web workers was this introduction on HTML5Rocks. I used Chrome v13 for my tests. If you try this at home and point to local filesystem files, don't forget to launch a fresh Chrome instance with the --allow-file-access-from-files command-line switch, otherwise you will receive some strange security errors. The other possiblity is to run from a local web server.

At first, I couldn't seem to start a worker correctly. Or at least, my worker did not seem to receive messages. I was calling:

self.addEventListener('message', handleMessageFunction, false);

I had to change it to the following form to make it work:

self.addEventListener('message', function(e) { slave.handleMessage(e); }, false);

So, passing just a function reference as second argument did not seem to work, even though the function existed, was valid, was not a prototype method and had the right signature.

Once I had overcome that "glitch", everything went smoothly. I re-implemented my Python algorithm in pure JavaScript. I tried it directly (without workers). I had to acknowledge a few times that, indeed, a script on the page was taking some time to complete. And it finished in about 45". Nice performance for JavaScript!

To the core(s)

Having been spoilt by the nice Pool API from Python multiprocessing package, I embarked upon a journey to implement something similar with web workers. Basically, I created a protocol of messages to be exchanged between the workers and the pool driver. It is not polished, but it works well already. You can see the result here. Quick explanation:

"Create pool" creates the worker pool with the requested number of workers. Workers are materialized by colored squares with a number. When the square is red, the worker is busy; when green, it's idle.
"Run" sends all problem cases to the pool. The size of the queue should increase. It is the number of tasks not yet taken by a worker.
"Kill" kills all workers and drops the pool, so that you can create a new one with a different number of workers.

Web worker pool demo

The input field is preloaded with the large problem set. On Chrome with four workers, the runtime is divided by three, compared to the single-threaded version: the problem is solved in about 15". Chrome does not seem to impose limitations on the number of workers that can be started. I tried with ten, and the results was almost identical. My machine was, of course, 100% busy.

Opera performs nicely on this one too, with the same code. Single-threaded solution in 75", four-worker solution in 75". Opera correcly creates the requested number of workers, but it probably does not use OS-level threads. They might only be "green-threads", with an internal scheduler, which would explain why only 25% of the CPU is consumed.

Firefox 4 gives me some problems, maybe because of memory limits or the size of the input value. The program runs unmodified, but you have to change the first input line from 10 to 8. I don't know why it chokes on the ninth and tenth cases. The single-worker version runs in 76". The four-worker one does not consume all available CPU. It's stuck at about 50% and consumes a lot of system time. Only three workers are busy simultaneously. And it runs in 230". So much for the performance gain.

And IE9, with its native HTML5 support... just kidding.

tl;dr

Chrome is fast, has great support for web workers. Demo here.

Monday, May 23, 2011

C++ slower than Java and Python!

... when badly programmed

I was still trying to squeeze the most performance out of my solution to the hangman problem from round 1A of Google Code Jam 2011 and so I decided to give C++ a try. C++ is like a long forgotten friend to me. I started my paid carreer with it, almost twenty years ago. I learned some tricks with the STL, template methods and other fun things. I already had a humble background in C, so pointers, memory manipulation and the like were still fresh in my head.

Today I simply tried to port my Python algorithm to C++, as straightforwardly as possible - I don't know if "straightforwardly" is a very usual adverb, but I'm French, so please excuse me if it isn't. With some vague memories about memory allocation I tried to use reference types as much as possible, to avoid unnecessary copying. I used string, vector<>, map<> quite easily. I wrote very simple input and output with cin and cout. I had a valid C++ program in no time. I ran it on the small input set. It was immediate and gave the right result. "Everything's under control" I thought.

Then I ran it over the large set, and it took 3'30": one minute longer than the Python version! Can you imagine?!

gprof to the rescue. I tweaked one heavily used string manipulation function. 3'. Still not fantastic. I realized that map<> was based on ordering and offered log(n) access time: I switched to unordered_map<> to get a hashmap implementation. Around 2'30". Now it was as fast as Python (but still five times slower than the Java version). And now gprof did not give my anything interesting. A few mangled symbols appeared on top, especially with -O3 compilation. I tried to reduce the number of string allocations, using string const & as much as possible, but to no avail. I started to think that C++ was definitely not good at string handling. I thought that maybe using Cygwin g++ was suboptimal. I tried g++ on Linux. Still the same order of performance.

And then I saw the light

In the middle of my main algorithm loop, I was swapping big maps. Real map objects, not references. So each iteration of the loop would do a complete copy of the previous map. Arghh! C++ memory allocation is quirky, coming from managed languages like Java, Python or JavaScript. Replacing those maps by references gave the top rank back to C++: 17" for the single-threaded version, running on a single core and not even compiled with optimization. L'honneur est sauf.

Lessons learned

It's far less difficult to work with C++ and the STL than I expected, for problems like those from Code Jam.
However, manual memory management is something you always have to keep in mind.
My newbie usage of gprof did not allow me to track superfluous data structure copying. If you have any suggestions in that domain, I'll be glad to hear them.

Sunday, May 22, 2011

Taking advantage of multiple cores with Python and Java

The first round of Google Code Jam 2011 has just ended. I participated in round 1A, waking up at 3AM on a week-end day (to tell you how motivated I was). The first problem was not too difficult, but it still took me a good 45 minutes to solve. Then came the second one, The Killer Word: the one where you have to find the hardest word for your opponent to guess in a game of hangman. I implemented the solution the way it was described, but my first small attempt failed.

I dug into the code again, and found a rule I had not implemented. Alas, my next attempt was also a failure. I had to try harder. Another bug fix later, my third submission was still not right. Time was running low. About twenty minutes before the end my rank was around 900th and falling. If I could not submit another valid solution, I would probably fall behind the qualifying 1000th rank.

After another pass over the code, I found yet another subtlety that I had not taken into account. Clickety-clickety-click (I have a "Das Keyboard" that really clicks, if you want to know). Another submission. Fingers crossed. And... yes! Now it was right. The small problem set was now solved. My rank raised to 500th. I was saved. Time to download a large problem set and run the program over it. Eight minutes should be plenty. The dictionary size was multiplied by 100 and the list of letters by 10, but it should be ok... Except it was not. Not at all, by a long shot.

Sleeping over it

With the knowledge of my qualification and still the problems in my head, I went to bed. When I woke up, I thought that perhaps a multithreaded implementation could have saved my attempt at solving the large problem set. As a first step, to speed up the program, I rewrote it from Python to Java. Straightforward, if a little dull. It ran faster, but still it was impossible for that algorithm to succeed in less than eight minutes on the large set. So: back to thinking.

I implemented a solution another way: instead of computing the score of each dictionary word in turn, I processed all words at once, keeping only the best one at the end. I programmed that new algorithm in Python again, because I really love its "runnable pseudo-code" feeling: no need to write types, literal syntax for everyday structures (tuples, arrays, dictionaries). I ended up with a valid solution that managed to solve the large set in about 2'30". Not too bad for Python.

Going multicore

The Python solution was now adequate, but the idea to take advantage of the mutiple cores of my machines was still in my head. So I decided to crawl the web and the excellent reference documentation for Python. My first reflex was to look at the thread and threading modules. But, as you probably know, the standard CPython implementation relies on a Global Interpreter Lock (the GIL) that practically makes multithreading useless, except for handling concurrent blocking I/O. Fortunately, the documentation pointed me to the multiprocessing module. The basic principle is that instead of creating thread inside the Python VM, that module allows to create OS-level processes, each one running a distinct Python VM, but with communication channels established between them, so that you don't have to leave the comfy Python environment. And of course, it's as platform-independent as possible.

For my needs, I looked at the Process class, but rapidly learned about the Pool class. The API is perfect, almost magic. You create a Pool object (Python guesses the number of CPU for you). You assign it all the functions you want to run asynchronously. You close it, which means you have no more work for it to do. You join it (you wait for all workers to process the functions). And you only have to get the results and sort them. That's all. In code, it looks like that:

 def solve_with_pool(cases):  
   from multiprocessing import Pool  
   
   pool = Pool()  
   async_results = [pool.apply_async(solve_case, args=(case,)) 
                    for case in cases]  
   pool.close()  
   pool.join()  
     
   results = [ar.get() for ar in async_results]  
   results.sort()  
   return results

It's difficult to get easier than that. solve_case() is the function that, as the name implies, solves a case and returns the result as a tuple whose first element is the case index. That's why the simple call to results.sort() works: it sorts all results by index. In fact, it's probably useless, because the results list is built in the right order... Anyway, with that in place, the program runs now in 50", which is three times faster than the single-threaded version, on my quad-core machine. The speedup factor is not four, because the division of work I have made is coarse-grained and there are only ten cases in the large set. So one of the processes probably got to work on two large cases and ended some time after the others had finished. Still, not too bad for ten lines of code.

And Java?

If you read the title of this post thoroughly, you've seen Java in there. So. I remembered Java 1.6 standard library included a part of Doug Lea fork-join framework that was previously distributed as a separate package. Except it does not. It's planned to be included in the upcoming JDK 1.7. So in the meantime, I'll have to resort to simpler constructs. But at least, I think java.util.concurrent will give me a higher level interface than direct use of Thread and synchronization.

The principles are the same as the Python version, but of course, Java is a bit more verbose, mainly because of explicit typing and also because it lacks first-class functions. So I have to create a FutureTask, built from an anonymous class derived from Callable. Otherwise, the API is at the same level: a pool of executors, each one is executed, the pool is shut down, we wait for all computations to end (for one day, because we are forced to provide a timeout value), and then we collect all results. The managed exceptions also add a bit to the verbosity: both awaitTermination() and get() can raise some exceptions that are of no interest in our case.


  public static Result[] solve_with_pool(Case[] cases) {
    ExecutorService pool = Executors.newFixedThreadPool(4);

    FutureTask<Result>[] fresults = new FutureTask[cases.length];
    for (int i = 0; i < cases.length; ++i) {
      final Case c = cases[i];
      fresults[i] = new FutureTask(new Callable<Result>() {
        public Result call() {
          return solve_case(c);
        }
        });
      pool.execute(fresults[i]);
    }
    pool.shutdown();
    try {
      pool.awaitTermination(1, TimeUnit.DAYS);
      Result[] results = new Result[cases.length];
      for (int i = 0; i < cases.length; ++i) {
        results[i] = fresults[i].get();
      }
      return results;
      
    } catch (InterruptedException e) {
    } catch (ExecutionException e) {
    }
    return null;
  }

The net gain from going multithread in my case is less impressive than in Python. The single-threaded program solved the large input set in 25" while the multi-threaded one took 18". I have not investigated why.

Conclusion

Well, turning a single-threaded problem solving program into a multi-process or multi-threaded version is not too difficult. The available APIs sit at the right level of abstraction, both in Python and Java. The runtime gains are significant. Of note, the Java program, when running full steam on all four cores tended to make the whole PC sluggish, with the mouse pointer moving less smoothly than it should.

Now I know. And hopefully you do too.

Thursday, March 31, 2011

Genetically engineered Qwop (part 1)

You might have heard about QWOP. I didn't. That's a comment on Reddit that made me discover "the hardest game ever." I believe the original home page for the game is at foddy.net. Go try to play it before reading on. You might end up very frustrated, or alternatively rolling on the floor laughing (while Qwop is in fact creeping on the ground, doing negative scores).

Wrong way...

After some time trying hopelessly to master the limbs of the athlete, I wondered what the best way to make a playing bot would be. That's a game without an obvious solution. There is not even a way to know what to do to run better. But at least, the inputs are quite simple. That's the perfect test case for some genetic programming!

# The idea

Ok, so what is needed for genetic programming? A form of DNA encoding, from which individual entities can be built and a fitness function telling us if the algorithm is going in the right direction. Generally, there is also a simulated environment where the entities "live."

In my case, the simulated environment is the Flash game itself. I will only drive it with synthesized key strokes. The advantage is that it's the real thing. But there are two main drawbacks. The first is that the simulation is real-time, which is quite slow and will have to run for long hours. The second is that while running, I cannot use the computer, because the simulator is moving the mouse and generating keystrokes. Hopefully, we have virtual machines to solve the second one. I used a VirtualBox Ubuntu GNU/Linux environment to drive the game. On a standard quad-core desktop machine, it appears to run at the same speed as the non-virtual game.

The fitness function is easy to find: it's the distance reached by Qwop before falling down, possibly combined with the time it took, to have a rough idea of speed. The interesting part is finding a DNA that generates a population vast and rich enough to host a possible winner. So what's the idea? I think that Qwop can run correctly with a repeating sequence of key strokes, at least once he reaches his cruising speed. So my DNA should produce a series of key events at some given time intervals, plus a repetition period. Let's try to formalize that.

I will write an uppercase letter for a key down, a lowercase letter for a key up, a plus sign for a unit pause. So for instance the string "Q+qW+wO+oP+p" means "Press Q for one unit of time, then simultaneously release Q and press W for one unit of time, release W and press O for one unit of time, and the same again for P." By the way, that sequence is not so bad. Qwop can reach 100 metres with it, but quite slowly, so, we'll try to find something better.

# Down to the code

Last time I chose to program in Python, because I love the language. But I used some Windows-specific libraries and some people were frustrated they could not see the program run on their machine. So this time I'm going to use Java. It's arguably less expressive than Python, but I know that the GUI and window interface is quite portable: my code for capturing the screen and sending synthesized key events should run unmodified on Window, Linux, Mac OSX and probably Solaris. I could even run Jython on top of the low-level calls. Or clojure for the lisp-lovers around here... For now I will start with pure Java. I will also include a functional, if rudimentary GUI to pilot the simulation.

The main class for interacting with the desktop is java.awt.Robot. It does everything I need: capture one pixel, or a rectangle on the screen and send mouse and key events. Let's start with screen capture. I will do as with Papa's Pizzeria: take some reference screenshots and search the screen for the reference patterns. I'll try to locate the game area by finding the light blue border around the message that appears on the opening screen. It has an homogeneous color (#9dbcd0), is four pixel wide, starts at 124,103 and goes to 505,306. But as you might have noticed, almost the same message box appears when a run is finished, except it's not exactly at the same position. It's at 129,99. So, in order to make it as easy as possible to run the program, and not force you to reload the game each time, I compute the real location of the game depending on which message box is shown. And how do I know? Easy! I try to find the two bright yellow ribbons that appear on the finishing screen. That's also the way I detect when the run has ended.

# Going the distance

The other thing I need is a way to read the current distance reached. That's the tricky part. I first thought I could apply some mathematical morphology to detect shapes, lines and characteristic points (end of lines, crossings...). That's certainly a valuable approach and I should work a little more on the theoretical part, but I finally decided to go for the easy route. I grab the part of the screen where the distance is displayed. I apply a very simple thresholding function to turn the image to black and white. Then I segment the text by finding fully empty pixel columns. And once each box is found, I match it with the reference images I have taken for all digits. The comparison sums the number of equals pixels and computes the percentage of identical pixels over the comparison area, so that small digits don't give overly low scores (like 1 or the decimal point). When the result of a comparison is over 90 % I consider it a match. Below that value, I drop the result. That way, I don't have to make a special case for letters, which don't match anything: they are naturally eliminated.

You can see from the screenshots that my segmentation algorithm is not perfect: the letters e and t are stuck in the same box, because there is no clear column between them. Hopefully, I don't want to read the text and it seems digits are always cleanly separated on screen. Would I need to read letters, I would probably try to segment using connex parts. That would probably work here because the threshold function has done a good job, but it's close...

# Hurdles

If you have played the game long enough, you have seen the hurdle at 50 meters. It's the only one in the game, but not in my project: I have faced several big issues, and I don't know yet if my approach will yield any good runner. The biggest trouble is that the real-time keystroke playing is not perfectly stable. The same string of key presses and pauses does not always give the same result in the game. I tried my best to send keystrokes at regular intervals, but even then, our poor athlete sometimes behaves differently. It forces me to run candidates several times and to consider statistically significant results only.

Another problem I found when trying to make the simulator run for several hours is that the game probably has memory leaks. After a few hours of uninterrupted play (something no human would ever do, I have to admit), the memory consumption of the Flash player is really slowing down the whole machine. I had to code a small workaround that simulates a F5 key press at regular intervals to reload the browser page and start with a fresh Flash player.

# Going genetic

After the first few tries, it's obvious that the task is going to be tough. Most of my random athletes fall on their head, or fold their limbs in very unnatural ways. So, the first step is to generate enough healthy candidates to initiate the genetic pool. Let's say that the first generation should contain only runners that can reach 5 meters in less that a minute and not crash at least 50% of the time. That's the condition I will set for them to be able to "reproduce". It might seem easy. I assure you it is not. I've had to run hundreds of random Qwops to get a handful of valid ones.

And that's where I am today, still running batches of random guys, hoping to generate some acceptable ones. There is no genetic programming yet. I still need to figure out what kind of mutation and reproduction I will do to try to produce better sprinters from the early cripples. I accept any good ideas you might have. I will need them, I think.

Ouch! That really hurts!

# If you want to try

The code is hosted at gihub https://github.com/slowfrog/qwopper. Running 'ant package' in the top-level directory should produce a jar file in the dist/ directory. Running the jar directly will open a small UI where you can launch some runs. The simulation can also be run from the command line like 'java -cp dist/qwopper.jar com.slowfrog.qwop.ui.Main 10 5'. That will run five different random athletes, each one for ten tries. Currently, that simulation automatically stops the runner after one minute, to avoid wasting too much time on the first batch.

Thursday, February 10, 2011

Making Pizza with Python

I read a post on Reddit recently from a guy who wrote a Python program to play Bejeweled automatically. Since I've always be interested in the concept of programs driving other programs, I decided to try to write a program to play a more sophisticated game. I chose Papa's Pizzeria, by Flipline Studios. You can find the game at Kongregate. In the game, you are Roy, the delivery boy of the pizzeria. You come to work one day, and Papa is missing. He left you a note to tell you he is gone and asks you to run the pizzeria yourself. Customers are already arriving at the door, so you have no choice but to try to measure up.

Roy running the shop

To play the game, you have to juggle between different tasks: take orders from customers, make the pizzas by putting the right toppings at the right places, bake them for the right amount of time and finally cut them in the correct number of slices. Customers will then rate you on those four activities. The game starts easy with only two customers to serve in one day, using only one topping and very short baking times. Then it becomes more and more difficult with up to ten customers a day, each one with several different toppings at different places, with different cutting and baking times.

To be successful, you must be able to multi-task. If you do each pizza in turn, customers will wait too long and give you bad rates on that criterion. But doing several things in parallel is tricky, especially since you run the risk of missing the right baking time. Imagine you have a pizza in the oven that's just about right. If you go take another order, it might take ten seconds, and when you are back at the oven, your lovely pizza will be too baked and you will receive a mediocre baking rate. So, you have to keep things in mind, organize your order tickets the best you can in the limited space that is available, be quick when placing toppings and cutting the pies. After some time, you begin to get the knack of it and you can run a ten-customer day in around ten minutes. And that's when it starts to feel repetitive. Time to write a bot!

Python to the Rescue

What is needed to write a pizza baking bot? At the most basic level, three things: a way to recognize what happens on the screen, a way to act in the game, and some thinking between those two. I won't dare to call it AI, at least in my implementation. It sure won't pass the Turing test!
If you want to jump right in and see the bot in action, skip the boring explanations and go to the last section.

Reading the screen

So, how does my pizzabot know what's on screen? Well, I'm not ashamed to tell you there is no complex image recognition in play. The program takes screenshots of portions of the screen and compares the pixels with some expected values. It probably makes the bot more fragile and less flexible, but a hell of a lot easier to program and faster to run. It also helps that the game is somewhat slow: it's not an action game where you have to react in a tenth of a second.

I used two main methods to read the screen. The first one is simply to compare one precise pixel with its expected color. For instance on the image below, what you can see is a close-up of the bottom left corner of an order ticket, where the baking time appears. I manually took screenshots of all possible values, blended everything on one image, chose a set of very significant pixels (pointed by the blue arrows) and read the expected RGB values of those pixels. When trying to determine baking time, the bot makes a small screenshot (the blue rectangle) and compares each significant pixel with its expected value. When there is a match under a given tolerance, it's done.

Reading baking time

That's how the bot sees most game elements: button presence, beginning and end of order by a customer, baking time, number of slices, topping positions. Also, to determine the exact position of the game on the screen at startup the same technique is used, but not with one pixel, with an array of nine pixels that must match the green chequered background at the top left of the game area. Actually several positions can match, so the bot scans the screen in 5-pixel increments (which makes the scan 25 times quicker than on the whole screen) and once a possible match is found, it tries to gently slide to the top left as long as the new position is still a match.

The other technique, used for reading toppings and topping counts is a simple comparison with reference images. Here, comparing only one pixel with one expected value would be too error prone. So I took screenshots of all possible toppings and all possible counts, cut small images, painted in pure blue the background portions of the images. The bot loads those images and computes a distance between what he sees on screen and the reference images. Blue pixels are not part of the comparison. I could probably have used some alpha channel, but I only had experimented with loading opaque PNG files with PIL and I figured out that blue was not too common in toppings (when was the last time you ordered a pizza with something that blue on it?).

Comparison of toppings from screenshot with reference images

Acting

That's probably the least interesting part. Most of the work was to take reference screenshots of the different screens and do some measurement to find the location of actionable items: buttons, toppings, order ticket, cutting lines. All the coordinates are stored in dictionaries or constants or even used directly in the code. I know, I know, that's dirty... But the bot is not supposed to be a big enterprise project shared among tens of people and maintained over decades. It's supposed to be fun for me to write and for you to run. Nothing more. Hopefully the functions where those magic coordinates live have meaningful names like goto_topping_station() or click_save_for_later().

Thinking tactically

During a game, the bot keeps a record of the current status: how many orders have been taken, in which state they are, are there customers waiting in the line, what's in the oven, for how long it's been there and how much time is left before it is perfectly baked... With all that information, it regularly makes a list of possible actions, assigns some priority to all of them and picks the top one to execute. That's all very simple. The priority is just a combination of static preferences and waiting time (I prefer putting quickly in the oven an already prepared pizza when a slot is available, then taking orders or serving customers). That way, the longer an order has been in a given state, the more likely it is to be chosen as the next to be processed.
There is one subtlety: taking pizzas out of the oven must always have the highest priority. That is because if you miss the ideal baking time, you risk losing a lot of points. So the "out of oven" action is managed completely out of the priority queue. When evaluating possible actions, if a pizza is about to be cooked in less than a few seconds, the bot goes to the oven and waits for the perfect time. I felt it was needed because most other actions can take for 2 to 10 seconds and they cannot be interrupted. What is also helpful is that the game seems perfectly timed. There is actually no need to read the baking indicators in the oven: one step on the scale (one eighth of a turn) is exactly 22.5 seconds long. So the bot records the time when each order is put into the oven and computes the perfect time when it will be baked. Pretty easy stuff. Thanks again to Flipline Studio, that made it easier. Actually reading moving "needles" would have been more tricky.

No need to read the baking timers

Putting it all together

That's almost all there is to say. The complete scenario of the driving function is the following:

find the offset of the game area
click on "Start game"
choose a saved game
then, for as many games as asked:

look for possible actions (take order, make pizza, put prepared pizza into oven, get pizza out of oven, cut and serve pizza)
execute the one that is deemed the most important
until the shop is closed and all orders have been processed
look at the results for five seconds
look at the tips for five seconds

If anything goes wrong during a game (it sometimes happens), the bot exits.

The five seconds waits at the end of each round is there so you have a chance to stop the program without losing your score. You can cleanly kill python and close the game.

Wanna try it?

The whole source is hosted on github. Get it, and follow the directions:

Prerequisites:

Python 2.x. Works at least with 2.6 and 2.7. Don't know about Python 3.
Windows (sorry about that!) Can probably be easily adapted. Only a few functions do win32 calls (event generation only)
PIL for screen capture and image file loading
You should also have played the first day of the game: that one is special because of the intro but mainly because the tutorial is a bit intrusive.

Run the bot:

launch your favorite Python shell. I really like IPython, but the standard shell works the same.
import botutil, that's the only source file
open a browser window on the game (here's the address again on Kongregate)
make sure the whole game area is visible, especially the top left corner: that's where the bot looks first. Also make sure you are on the title page (with Papa's Pizzeria title and the three yellow buttons)
launch the bot using a given save game for a number of rounds with botutil.start_game(save, rounds). save should be 0, 1 or 2. rounds could be anything more than zero
watch the bot making perfect pizzas. It always gets 100% on toppings and cutting. It gets 100% most of the time on baking, and between 90% and 100% on waiting time, which usually gives a general score around 99%.

If it doesn't work for you, just ask me: maybe I forgot something. And if it works for you, you can tell me too.

Also, I'm not a very experienced Pythonista yet, so feel free to tell me where I'm using the wrong idioms or where I have used over-complicated code.

And enjoy your lovely baked Python pizzas!

P.S. Video now available.