I had an interesting debate in the team on which is the right programming language for
Hadoop, the open source distributed computing engine from Apache. One of my architects was looking for recommendations from the rest of the team on what would be the right choice for the best programming language for building applications on Hadoop.
My first response to anybody having to make a language choice is something like this
Interestingly instead of a holy war on programming languages, some of the recommendations that came up from the team were fairly mature and reasonable approaches (at least I thought so) in terms of how to make some practical choices. In this post I am sharing these perspectives.
Before we start discussing the solution, let me abstract the problem statement as "For the next hot technology that can be programmed using multiple programming languages, how would you go about choosing the right language?". While the problem statement sounds trivial I believe that the choices we make upfront especially if you are a large corporation will have an impact on your TCO of the chosen platform. Without much ado, here are some of the points of view for making a decision.
1. Look at the programming language the higher level frameworks (like Cascade/ TestNG in the case of Hadoop) have selected. The choice of programming language(s) will be fairly accurate choice after you pick your framework.
2. Select a language by measuring the stability of libraries that support these frameworks - a way to evaluate them would be to find out what large apps exist / who uses them / current version number etc.,
3. A completely alternate approach would be to pick a language today (JRuby for example) as long as it runs on the JVM. This way, we are not architecting ourselves into a corner i.e. we can still diversify to another language in the future that can run on the JVM. This is more of a "defensive play" than making a hard decision today.
Now if you think about these points of view, they are actually addressing some of the deeper concerns well. Let me elaborate.
I don't think anybody who has used Java will ever debate the fact that Java as a language will not have the kind of acceptance it has today without the Apache software foundation's contributions - over 100 projects and some of the best in class libraries for various capabilities.. Conversely, any large foundation effort towards any new technology would usually have a large community participation which tends to make selection for the right language for their development a reasonable democratic process. Further for such initiatives to succeed and even survive the selected language must have the capability to use the technology in the best way possible. Thus any long surviving framework will represent a reasonably evolved programming language that supports the new technology.
The second benefit of stability and popularity of a framework is that there will be a large user base for the framework. This essentially means that all these developers have learnt the programming language on which the framework has been developed. Our ability to find the right talent in the job market becomes easier when such frameworks are abundant.
The third benefit we get is that most software today gets developed by people who actively use the internet to get help and forums to ask for support. While there are a lot of how to get started guides, the challenges of resolving day to day issues has to happen on forums. While questions on the language itself don't come up so often, today the boundary between the language and the application development framework / SDK is becoming nebulous and in the process lot of the programming idioms are shared while responding to SDK questions by experts.
I hope this information has been helpful. How would you approach this problem? Do you have other insights? I would love to hear the options. Please post your comments.