Monday, June 08, 2009

Recruitment a'la open source

I just read Allan Kelly's blog criticising 'Foie Gras recruitment'. Allan's point is that adding developers too quickly has the opposite effect than intended, slowing down a project. In fact, recruiting always slows things down before it speeds things up, due to the cost of training up and familiarizing developers with the company processes, product vision and code.

However, there are many factors that affect this, and influence the severity of the problem as well as the teams ability to deal with the problem. Allan mentions one, the teams processes and practices. However, another really important one is the character of the developers involved, especially the new ones.

Imagine the hypothetical case where you magically recruit only developers that are actually capable of such a high level of self training that the negative impact on the team is much less than average (obviously never zero). Imagine also that the answers to the questions are usually available without another team member having to spend time. For example, the answers lie in the code itself, and any associated well written documentation, including feature specification, project goals, etc.

Obviously this is a hypothetical scenario essentially never achieved in corporate development, but it does exist in the real world, in many open source projects. Often people enter open source projects because they did their own self-training, read the code, tried things out and made working contributions that were good enough to get the attention of the project owners, and as a result received admittance to the team. This scales much better, and faster, than normal recruitment. So why does this not happen in the corporate development world? Usually because it relies on statistical factors no often achieved, related to the percentage of available developers of sufficient and appropriate skills, also sufficiently interested in the project to put in the time. This is a low number, especially when you consider that by the term 'interested' I also imply that the developer is able to make a living from this activity.

So how do corporate development projects benefit from this? Or can they? The problem being that corporate projects are, almost by definition, not interesting enough to the potential developers.

Personally I believe it is possible to find a middle ground, if you close the gap from both ends:
  • move the project goals closer to the developers goals (make the project much more interesting to open source developers, make it open source, make it do things more interesting to a wider audience)
  • move the developers goals closer towards the projects goals (ie. pay the developers)
Obviously the second option should not be undertaken using normal recruitment. You still need to use open source recruitment (statistical filtering as described above).

Is this really hypothetical? No, I've actually been putting this into practice with my most recent recruitment drive. I recruited three new remote developers without reading a single CV or holding a single interview. Instead I simulated the open source approach by using the following steps:
  • require a code contribution, which was evaluated (testing not only coding skills, but ability to read specs, work remotely, solve problems independently, do internet research, and perform self-training)
  • contract for a trial period, testing their ability to perform with other remote developers, double checking their skills, notably an increasing understanding of the project itself
  • contract for longer periods with tighter integration into the team
Now three months down the line, I have actually seen quite decent productivity. I count the approach a success, and I'll be sure to use the same technique for most future recruitment drives.

One final point. This approach does not solve the problems identified by Allan Kelly. It only serves to reduce their impact. And it does introduce another set of problems related to efficient project management of loosely coupled remote teams. That is a subject for separate blog :-)

Monday, May 25, 2009

The Secret of Googlenomics

I just read an amazing and insightful article in wired about the 'Secret of googlenomics', which was an riveting introduction to the auction based principles that have become the core of almost everything at google. And even more importantly represent a possible future for many other modern elements of the future economy.

Most of the article references a presentation given by google's chief economist, Hal Varian, who's career was inspired by Isaac Asimov's books The Foundation Series: "In Isaac Asimov's first Foundation Trilogy, there was a character who basically constructed mathematical models of society, and I thought this was a really exciting idea. When I went to college, I looked around for that subject. It turned out to be economics."

I was also inspired by Asimov's theory of 'psychohistory' when I read those books back in the early 90's, but unlike Hal, I thought the idea was entirely impossible, and so I stuck with reality and studied pure science. Perhaps I was wrong, as google's mathematicians now do take into account everything from the weather to peoples fashions and buying habits, to predict the best adverts to use on search results.

I strongly recommend reading the entire article at http://www.wired.com/culture/culturereviews/magazine/17-06/nep_googlenomics. For a taster, here is the concluding paragraph:

There's a wild contrast between this sparsely furnished residence and what it has spawned—dozens of millionaire geeks, billions of auctions, and new ground rules for businesses in a data-driven society that is far weirder than the one Asimov envisioned nearly 60 years ago. What could be more baffling than a capitalist corporation that gives away its best services, doesn't set the prices for the ads that support it, and turns away customers because their ads don't measure up to its complex formulas? Varian, of course, knows that his employer's success is not the result of inspired craziness but of an early recognition that the Internet rewards fanatical focus on scale, speed, data analysis, and customer satisfaction. (A bit of auction theory doesn't hurt, either.) Today we have a name for those rules: Googlenomics. Learn them, or pay the price.

Monday, May 11, 2009

Artistic Engineers



I've always believed that artistic or creative talent was indispensable in technical fields like science, engineering and software development. But I never put together a coherent enough description to warrant a blog post, only the occasional soliloquy over a drink. But now I've just read DHH's blog entry "We need both engineers and artists in programming", and he described it so well, I just had to respond. His description focused on a developers perspective:

People waxing lyrically about beautiful code and its sensibilities. People willing to trade the hard scientific measurements such as memory footprint and runtime speed for something so ephemeral as programmer happiness.

Now I'm originally a pure science researcher. And there is no more extreme case of a non-artistic image than that of a scientist. What do most people think: white lab-coats, thick-rimmed glasses, rigorous systematic approach to everything in life and a total lack of
artistic flair.

And often that image is not entirely inaccurate. As 'Robert Martin' indicated, professionalism is a very important quality for software development (and I add - science and engineering in general). But as DHH asserts: 'the wonderful thing about this new age of programming is that we need and prosper from both types of programmers'.

I agree with David. You really do need both types. And if you look back at some of the most impressive discoveries in science in the 20th century, there were artistic people involved, usually with the key discovery. I love the biggest deviation from the boring stereotype - Einstein, with his wild hair and almost chaotic appearance.

It's all about thinking outside the box. David says it's all about 'programmer happiness'. Of course he's right too.

Now what about the irony that DHH's profile shot is so much more professional looking than Einstein's?

Monday, April 20, 2009

What's the point of github?


While driving to Malmö last Friday to attend a tech talk on git by Sébastian Cevey and hosted by PurpleScout, I was trying to explain distributed source code management systems (like git) to a non-developer friend of mine. I very quickly found myself explaining much more about git than I realized I knew. And I found myself asking, and answering, what I think is a very interesting question: what is the point of github?

The situation is that git, and other distributed source code management systems, like bazaar and mercurial, appear to start from the philosophical position of giving complete control to the end user (in this case the developer). They are not centrally controlled systems, there is no central server, no 'little' boss to ask permission from for access to files, branches or projects. When you clone the repository, you get it all, with all history and everything. Power to the people!

This allows for highly flexible distributed teams, each working in their own way, as suites the developers themselves. It completely solves the usual problem found in central systems like CVS, SVN and, heaven forbid, Perforce: getting permission from a non-developer to do development.

So then, why does a site like www.github.org exist? It seems to imply adding back a central server to the de-centralized system. With a little thought, I realised what was going on. The problem had never been about central control, it was all about who has the control, and distributed systems actually do not remove the concept of central control at all. They just facilitate a situation where the right people are in control.

To explain this, I should re-describe what the original problem was. Consider CVS and SVN, arguably the industry standard(s). You have a central server with the code (and history and branches, etc.). Each user checks out a working copy of a branch of the code. After doing work, they commit back to that branch (dealing with conflicts and merges as needed). This implies a very particular workflow, and forces connectivity to the server for all major actions that require working with the code history (checkout, update, commit, branch, merge, etc.). And the mere existence of the central server implies the existence of IT and admin in the decision making loop, which can only hurt. Perforce, being more susceptible to the influence of IT on purchasing, took this one step further and required connectivity to the central server for almost any development activity, and, can you believe it, even requires developers to unlock each file they plan to work on! Can there be anything worse for developer productivity? Well, yes, anyone remember Microsoft's 'SourceSafe'?

What was the main problem here? It was not actually the central server, but rather it was a few things implied by this architecture:
  • The involvement of non-development staff in the smaller details of what the developer actually needs to do, which adds overhead to development activities, which means higher cost and less efficiency.
  • The implication of a specific workflow in the way the developers need to work with the code-base.
  • The need for regular or even continuous connectivity, which also has performance, efficiency and cost implications.
Distributed systems completely avoid all of this. Each developer has the complete history, and all branches, right there on his computer. They can do absolutely everything they want without asking anyone, and especially not asking people that don't know about software development. Maximum performance!

But at the end of the day, those developers need to get their code back to somebody in charge. There is always going to be one person or organization that actually sells the product, or distributes the product, or supports it. So, no matter how much power the developer thinks they have, the real world is still centrally controlled. But at least now the control is not micro-management. Now the control is closer to the real business, which is about getting good code to the right customers. Distributed source code management allows for this to be done most efficiently. The developers have all the power to do their job most efficiently, but with power comes responsibility and those same developer are now required to do all the merging back into the main code. How is this done without a central server? Easy, each developer simply publishes to their own public copy of the latest code-base. That public copy could even be a shared location on their own computer, accessible to the right people. Or, in the case of open source projects, it could be a world readable resource like github.

And that's the point of github! It is a convenient place for developers to publish their already merged work, for use by the central product distributor.

Not only is this a developers dream come true, but it is a software development companies dream come true. You don't have to manage the central server any more. You also don't have to do as much support merging other peoples code into your own, because you can push that responsibility back out to the developers, where it belongs.

I can't believe this was not done thirty years ago! Why is that? I have two theories:

Cobblers children - since both the customer and supplier are the same (the developer) for code management systems, perhaps it's a case of the cobblers children having the worst shoes. The developers simply work around bad code management systems, because they can.

Corporate control - if we look back at what I've said about the key differences between central and distributed systems, there seems to be a repeating theme regarding the involvement of non-developers, or company IT processes, in the way the older systems worked.

Having personally seen a lot of bad decision making by companies to increase their level of 'perception of control', I'm voting for the latter. (see my blog for more on this).

But those days are numbered! I think concepts like distributed SCM and open source itself are increasing the prevalence of businesses run on the principles of collaboration instead of control, with decision making by the people with the actual information.

Friday, March 20, 2009

15 million Africans are ready for work - Got Tasks?

I followed a twitter comment by Tim O'Reilly that quoted Nat Torkington saying "first 5 minutes redlined my awe-ometer."



So I just had to watch the video he was referring too, and the above screen-shot is how it ends. I know I spoiled the punch-line, but it's still worth watching so click the link and enjoy!

The presenter, Nathan Eagle, has started a service in Kenya and East Africa, called TextEagle, which allows mobile phone users to complete small tasks by SMS and get rewarded for it, in airtime or in credit. For a workforce living on $5/day and eager for more airtime, this works like a charm. Tasks include simple text translation services, local news reporting, and even listening to advertising!

This really is 'crowdsourcing' in action.

Tuesday, March 17, 2009

'The network is the computer' and 'the client plus the cloud'

I just read a very interesting article at computerworld, an interview with Craig Mundie of Microsoft, where he talks about the future of computing, and references some presentations he recently made. The article is titled "Microsoft's next big thing", which is a pity because it colours an otherwise interesting read with an overly self-congratulating attitude.

Craig described the future of computing as being 'the client plus the cloud', which reminded me very strongly of Sun's slogan 'the network is the computer' (originally coined by John Gage). Jonathan Swartz blogged specifically about 'the network is the computer' back in 2006, and gave a nice realistic picture about cloud computing, grids, and how both the average end user and corporate IT environments view and interact with these systems. Sun was, of course, announcing the imminent launch of their own commodity grid, www.network.com. Later that year Amazon launched the public beta of it EC2, which made a final release in late 2008. While Sun has not yet made the final release of their grid, many others have. It is clear that visionaries from these various companies have been on the right track for quite some time.

But, as Mr Mundie himself admitted, timing and market readiness are a very important aspect of the adoption of new computing paradigms. And according to him the future paradigm is all about the balance between the client (desktop OS) and the cloud (grids, the internet, etc.) He is absolutely right. And it is easy to be right when you are not predicting the future but observing the present. Aside from Jonathan's 2006 blog, we all know just how successful cloud computing, and commodity clouds like Amazon EC2 in particular, have become. Everyday internet services like google, yahoo, facebook and linkedin are all products of the success of the cloud. We are not about to undergo a paradigm shift, we have been in the transition for some time, and many, many vendors have jumped onto this particular train, including Microsoft, of course, with their 'Azure' grid.

While Mr Mundie may be a little off track about just how important Microsoft is to this new paradigm, one thing I must give him credit for is in making the whole subject much more interesting and enjoyable to read about. In particular his video presentation had the 'cool' factor usually associated with a Steve Jobs presentation.

Tuesday, February 17, 2009

Convert images to greyscale

There are at least a hundred ways of doing this, but I wanted a single-line way to make greyscale versions of a bunch of images on a website I was developing. The last thing I wanted was to load each one in turn into a graphics application to edit the colors.

ImageMagick to the rescue:
for img in *.gif ; do convert $img -colorspace Gray -colors 16 grey_$img ; done

Ok - so not really one line, but almost :-)

And this is what it looks like afterwards in my file explorer: