Discover how Behavior-Driven Development (BDD) testing can revolutionize data pipelines, enhance collaboration, and improve testing efficiency. Explore the benefits and challenges of BDD in testing scenarios.

What is BDD Testing

BDD Testing is an Agile approach that promotes collaboration by writing test cases in simple language. Unlike other approaches, which can result in differing perspectives among the Product Owner, Developer, and Tester, BDD Testing unifies these perspectives through a common shared language.

What is Gherkin

Gherkin is the language that Cucumber uses to define tests. It utilizes a minimal set of keywords to format the tests in an executable manner. Meanwhile, Cucumber is a widely recognized testing framework in the Java and Scala communities.

The Gherkin language includes several basic keywords:

  • Scenario: An example of the functionality or behavior under test.
  • Given: A step that sets the initial context of a testing scenario.
  • When: A step that specifies a condition or action.
  • Then: A step that defines the expected outcome.
  • Background: A context applicable to multiple scenarios.
  • Feature: A grouping of related scenarios.

Next, we'll look at examples of test definitions:

image (13)-3

Looking at the example above, it is pretty clear which tests are applied and why. We have an input, an output, and a context. There is no need to deep dive into the code to know about the scenarios we are applying in our process.

Any step like Given, When, Then will require a background implementation in Scala, but as soon as it's developed once, you can reuse it to describe any other scenarios.

How does BDD apply to data pipelines?

Nowadays, we have a lot of libraries that fit really well for testing data pipelines, but there are some edge cases where the usage of BDD can shine. Let's imagine a use case where we have some complex reverse ETL flow with many business rules. This flow uses, for example, a data lake as an Operational Data Store (ODS) and runs a set of transformations within the data lake to feed the resulting data back to an application layer. The transformations executed inside the data lake contain many business rules like:

  • Conditional constraints or flows;
  • Complex math calculations;
  • Calculations relying on event occurrence.

This situation can lead us to an expectation that varies for each row of our dataset. The outcome of this processing heavily depends on the data relations and specific conditions for the records, like the current status of a customer, or special pricing conditions for a product, and so on.

Many current data testing libraries rely on establishing our final expectation of the data. Basically, they utilize unit testing concepts: you provide an input, execute your transformation, and verify if the output aligns with your expectation. While this straightforward and effective method works well in numerous situations, it doesn't offer much context about our tests.

The 3 W's of data testing

The 3 W's of data testing is a concept of this author (If you've already heard about something similar, please let me know) to explain the context of our tests. The 3W's are:

  • What: Provides the outcome that we are expecting from our data transformation. Generally, it is a set of rows.
  • When: Provides context regarding events that may affect the expected result.
  • Why: Provides the reason regarding conditions that may affect our expected result.

image (14)-3

Common challenges often arise in traditional testing pipelines based on expected outcomes. For instance, if someone modifies a function used by the data transformation pipeline, even a small change to a default parameter can alter the order of the resulting data. This can break the pipeline, leading to output that doesn't match the expected outcome. In such cases, it can be difficult to identify the problem. A well-written Behavior-Driven Development (BDD) test scenario can help prevent these issues and save time.


  • The Gherkin language is simple and widely used;
  • It clearly defines the 3 W's of what is being tested;
  • Test construction is a collaborative effort;
  • Code can be reused. The step implementations can be applied to different test scenarios;
  • It reduces friction between technical and non-technical teams as the test definitions and implementations are linked;
  • It also serves as a form of documentation.


  • It isn't ideal for all tasks (I intend to delve into these limitations in a future article).
  • Depending on the declaration of tests, it could be costly, especially if we use a large amount of data in our mocks. At times, you might need to rewrite the Gherkin to optimize test execution.
  • Updating feature files can be challenging, particularly if there are numerous system rules that frequently change.

An implementation example

An example project is provided here, containing the source code of the example shown earlier in this article.

This project consists of two primary folders:

  • src: Includes the implementation of our data transformation functions;
  • tests: Comprised of two sub-folders:
    • features: Contains test scenarios written in the Gherkin language;
    • steps: Houses the implementations of our tests in Scala using Cucumber.



Felipe Lehnen

As a Senior Data Engineer specializing in Big Data & Data Science at UFRGS, I began my career in IT as a software engineer. I now have over 10 years of experience in the data field. I possess solid experience in distributed systems and the main programming languages used in data projects: Python, Scala, and R.

Efficiently Managing Master and Reference Data


How the Mulesoft JWT Validation Policy Works


How to Use Redis Cache to Prevent DDoS Attacks