Discover how Behavior-Driven Development (BDD) testing can revolutionize data pipelines, enhance collaboration, and improve testing efficiency. Explore the benefits and challenges of BDD in testing scenarios.
BDD Testing is an Agile approach that promotes collaboration by writing test cases in simple language. Unlike other approaches, which can result in differing perspectives among the Product Owner, Developer, and Tester, BDD Testing unifies these perspectives through a common shared language.
Gherkin is the language that Cucumber uses to define tests. It utilizes a minimal set of keywords to format the tests in an executable manner. Meanwhile, Cucumber is a widely recognized testing framework in the Java and Scala communities.
The Gherkin language includes several basic keywords:
Next, we'll look at examples of test definitions:
Looking at the example above, it is pretty clear which tests are applied and why. We have an input, an output, and a context. There is no need to deep dive into the code to know about the scenarios we are applying in our process.
Any step like Given, When, Then will require a background implementation in Scala, but as soon as it's developed once, you can reuse it to describe any other scenarios.
Nowadays, we have a lot of libraries that fit really well for testing data pipelines, but there are some edge cases where the usage of BDD can shine. Let's imagine a use case where we have some complex reverse ETL flow with many business rules. This flow uses, for example, a data lake as an Operational Data Store (ODS) and runs a set of transformations within the data lake to feed the resulting data back to an application layer. The transformations executed inside the data lake contain many business rules like:
This situation can lead us to an expectation that varies for each row of our dataset. The outcome of this processing heavily depends on the data relations and specific conditions for the records, like the current status of a customer, or special pricing conditions for a product, and so on.
Many current data testing libraries rely on establishing our final expectation of the data. Basically, they utilize unit testing concepts: you provide an input, execute your transformation, and verify if the output aligns with your expectation. While this straightforward and effective method works well in numerous situations, it doesn't offer much context about our tests.
The 3 W's of data testing is a concept of this author (If you've already heard about something similar, please let me know) to explain the context of our tests. The 3W's are:
Common challenges often arise in traditional testing pipelines based on expected outcomes. For instance, if someone modifies a function used by the data transformation pipeline, even a small change to a default parameter can alter the order of the resulting data. This can break the pipeline, leading to output that doesn't match the expected outcome. In such cases, it can be difficult to identify the problem. A well-written Behavior-Driven Development (BDD) test scenario can help prevent these issues and save time.
An example project is provided here, containing the source code of the example shown earlier in this article.
This project consists of two primary folders: