Up until recently, we, like many companies, built our data pipelines in any one of a handful of technologies using Java or Scala, including Apache Spa

Building Data Pipelines Using Kotlin

submited by

Style Pass

2021-07-23 13:30:04

Up until recently, we, like many companies, built our data pipelines in any one of a handful of technologies using Java or Scala, including Apache Spark, Storm, and Kafka. But Java is a very verbose language, so writing these pipelines in Java involves a lot of boilerplate code. For example, simple bean classes require writing multiple trivial getters and setters and multiple constructors and/or builders. Oftentimes, hash and equals methods have to be overwritten in a trivial but verbose manner. Furthermore, all function parameters need to be checked for “null,” polluting code with multiple branching operators. It’s time-consuming (and not trivial!) to analyze which function parameters can and cannot be “null.”

Processing data from the pipelines written in Java often involves branching based on the types or values of data from the pipeline, but limitations to the Java “switch” operator cause extensive use of sprawling “if-then-elseif-…” constructs. Finally, most data pipelines work with immutable data/collections, but Java has almost no built-in support for separating mutable and immutable constructs, which forces writing additional boilerplate code.

In deciding how to address these shortcomings of Java for data pipelines, we selected Kotlin as an alternative for our backend development.