The dark side of Hadoop
At BackType, we are heavy users of Hadoop. We use it to run computations on our 30TB datastore of social data. We've even open-sourced some significant projects that are built on top of Hadoop.
Unfortunately, Hadoop has problems. It's sloppily implemented and requires all sorts of arcane knowledge to operate it. We would be the first to try out a replacement for Hadoop if a viable alternative existed. In this post, we'll look at some of the darker aspects of Hadoop.
Critical configuration poorly documented
There's a configuration property for Hadoop called "dfs.datanode.max.xcievers". By default, it's set to 256. It turns out if you don't raise this value to be significantly higher (e.g., 5000), your cluster blows up with all sorts of weird errors like EOFExceptions.
As you can imagine, tracking this one down is quite an adventure. Even though this property is so critical, it's not documented anywhere. And on top of that, the property isn't even spelled correctly. It's supposed to be spelled "xceiver" (which is how it's spelled in the source code). See this post for more details.
Terrible with memory usage
We used to have problems with Hadoop running out of memory in various contexts. Sometimes our tasks would randomly die with out of memory exceptions, and sometimes reducers would error during the shuffle phase with "Cannot allocate memory" errors. These errors didn't make a lot of sense to us, since the memory we allocated for the TaskTrackers, DataNodes, and tasks was well under the total memory for each machine.
We traced the problem to a sloppy implementation detail of Hadoop. It turns out that Hadoop sometimes shells out to do various things with the local filesystem. When you shell out in Java, the process gets forked. Forking a process causes the child process to reserve the same amount of memory for itself as the parent process is using (to fully understand what's happening, you need to learn about memory overcommit and the copy-on-write semantics of forking in Linux).This means that the Hadoop process which was using 1GB will temporarily "use" 2GB when it shells out.
The solution to these memory problems is to allocate a healthy amount of swap space for each machine to protect you from these memory glitches. We couldn't believe how much more stable everything became when we added swap space to our worker machines.
Zombies
Hadoop is terrible at process management. It sometimes fails to kill processes that it launches, leaving "zombie tasks" throughout the cluster. These zombie tasks soak up memory, causing out of memory errors for real tasks. We've seen clusters get completely overriden by zombies and become useless.
The problem is that Hadoop puts the burden of a task exiting on the task itself, rather than the TaskTracker that launched it. So if a task enters a zombie state for one reason or another, there's nothing that will clean it up. This is a terrible design. The burden should be on the TaskTracker to supervise the processes it launches and kill them when necessary. Hadoop should not trust user code in tasks to behave properly. This ensures that if I mess up my code, my job may have problems but at least I won't damage the cluster.
A task turning into a zombie seems to be related to encountering an out of memory error, as we've noticed much fewer zombies since fixing Hadoop's other memory issues by adding swap space. We just wish that Hadoop was designed correctly so that zombies were not even possible.
Conclusion
Hadoop continuously finds new and creative ways to frustrate us. Making Hadoop easy to deploy, use, and operate should be the #1 priority for the developers of Hadoop. Hadoop could be an amazing project. Right now though, it's just plain sloppy.
Follow the BackType tech team on Twitter here.