Show training

Databricks Data Transformation

training code: DBX-PFE / ENG DL 1d / EN

The Databricks Transformation training is the final step in the structured training path Fundamental → Explorer → Lakehouse → Transformation. Participants will learn how to design modular pipelines, combine batch and streaming, apply advanced transformations, use Delta Live Tables, and integrate workflows with Git and CI/CD.

For more information, please contact the sales department. For more information, please contact the sales department.
2,500.00 PLN 3,075.00 PLN with TAX

The training is designed for data engineers and DataOps teams responsible for implementing and maintaining production data processing workflows in the Lakehouse architecture.

– can design modular Silver → Gold pipelines

– understand how to combine batch and stream in a single workflow

– can apply PySpark window functions for data transformations

– can use Delta Live Tables to automate pipelines

– know best practices for orchestration and CI/CD in Databricks

– can ensure data quality with expectations and monitor lineage

– are prepared to maintain production workflows in Databricks

1.Data processing architecture

  • Recap of Bronze–Silver–Gold in the context of transformation pipelines

  • Designing data flows in Silver and Gold layers

  • Modularity and separation of processing logic (load → transform → save)

2.Batch and stream load in practice

  • Differences between batch and streaming processing

  • Batch ingest using COPY INTO and writing to Delta tables

  • Streaming ingest with Auto Loader (cloudFiles)

  • Structured Streaming: readStream, writeStream, checkpointing, and fault tolerance

  • Integrating batch and stream

3.Advanced data transformations

  • Creating numerical, text, and binary features

  • Logical transformations (case when, when, otherwise)

  • Window functions (lag, lead, row_number, rolling average)

  • Creating time-based and session features

4.Delta Live Tables – pipeline automation

  • Declarative processing approach: CREATE LIVE TABLE

  • Creating DAGs and scheduling in DLT

  • Integrating DLT with Auto Loader and Structured Streaming

  • Expectations – real-time data quality control

  • Monitoring and lineage in the DLT interface

5.Orchestration and automation

  • Databricks Workflows – multi-task jobs, dependencies, retries

  • Pipeline parameterization (dbutils.widgets, dbutils.notebook.run)

  • Best practices for CI/CD and code maintenance (Repos, versioning notebooks)

6.CI/CD – practical Git (Repos) demo

  • Cloning a repository in Databricks Repos

  • Commit and push notebooks to Git

  • Running a pipeline from Workflows based on a repo

  • DevOps best practices for Databricks

7.Final project

  • Design and run a Silver → Gold pipeline using batch and stream load, Delta Live Tables, quality control rules, and Git integration

– Completion of Databricks Lakehouse or equivalent knowledge

– Knowledge of SQL and PySpark

– Basic experience implementing data pipelines

  • access to Altkom Akademia student

Training method:

The training is conducted in the Databricks cloud environment. Each participant receives their own workspace with access to Unity Catalog, SQL Editor, Notebooks, and a catalog with test data.

Training: English

  • Materials: English