Diary of a Dangineer: replaceAll vs. replace

In today’s Diary of a Dangineer1 I learned the difference between replace and replaceAll on java.lang.String the hard way.

My task involved sanitising large amount of data, by cleaning up some strings. So I thought this should be easy, and reached out to Spark. I wrote a small program that will merrily prance through bazillion lines of JSON stored on HDFS and will get rid of those nasty unwanted characters. I thought I should use replaceAll not
replace because, y’know, I wanted to replace all.

So this is what it looked like (Spark artifacts elided to prevent distraction):

Feeling happy with the code, I unleashed it on a cluster and left it running. Several hours and hundreds of Gigabytes later, I checked out the logs to see this stack trace popping up everywhere:

After a couple of WTFs, angrily punching the air, and going through the five phases, I read the documentation of replaceAll which stated:

Replaces each substring of this string that matches the given regular expression with the given replacement.

You see,String.replaceAll takes a regex as the first parameter, and it was compiling “(+1” to be a regex. I was supposed to use String.replace to replace to replace all when the the first parameter shouldn’t be treated as regex as evident in the code below:

Dangineer (n): An Engineer who works with Big Data, which is dangerously addictive.
Hey, Vijay is a hoopy Dangineer

What I’m reading

Here’s what I’m reading in August 2016:

Too Soon Old, Too Late Smart, by Gordon Livingston, M.D.

Out of a lifetime of experience, Gordon Livingston has extracted thirty bedrock truths: We are what we do. Any relationship is under the control of the person who cares the least. The perfect is the enemy of the good. Only bad things happen quickly. Forgiveness is a form of letting go, but they are not the same thing. The statute of limitations has expired on most of our childhood traumas. Livingston illuminates these and twenty-four others in a series of carefully hewn, perfectly calibrated essays, many of which focus on our closest relationships and the things that we do to impede or, less frequently, enhance them.

The Undercover Economist, by Tim Harford

Ever wondered why the gap between rich and poor nations is so great, or why it’s so difficult to get a foot on the property ladder, or where the banks went wrong? This book offers the hidden story behind these and other forces that shape our day-to-day lives, often without our knowing it.



Postcapitalism A Guide to Our Future, by Paul Mason

Over the past two centuries or so, capitalism has undergone continual change – economic cycles that lurch from boom to bust – and has always emerged transformed and strengthened. Surveying this turbulent history, Paul Mason wonders whether today we are on the brink of a change so big, so profound, that this time capitalism itself, the immensely complex system by which entire societies function, has reached its limits and is changing into something wholly new.