Azure Databricks Key Concepts

๐Ÿข

Workspaces

collaborative environment

Workspaces

A secure, collaborative environment for organizing notebooks, clusters, jobs, dashboards, libraries, and experiments.

๐Ÿ““

Notebooks

code + markdown cells

Notebooks

Interactive documents that combine runnable code, visualizations, and Markdown text for analysis, engineering, and collaboration.

๐Ÿ–ฅ๏ธ

Clusters

driver + worker nodes

Clusters

Compute engines that run workloads using a driver node to coordinate work and worker nodes to process data in parallel.

โšก

Runtime

Apache Spark builds

Runtime

A Databricks-optimized Apache Spark environment with performance improvements, libraries, and versioned runtime support.

๐Ÿ“…

Lakeflow Jobs

workflow automation

Lakeflow Jobs

Managed workflows that schedule, coordinate, and automate repeatable tasks such as ETL pipelines, ML training, and refresh jobs.

๐Ÿ”บ

Delta Lake

ACID transactions

Delta Lake

An open-source storage framework that adds ACID transactions, versioning, scalable metadata, and batch/streaming support to data lakes.

๐Ÿ”

Databricks SQL

Premium tier analytics

Databricks SQL

A SQL-based analytics experience for querying lakehouse data, building dashboards, and connecting BI tools.

๐Ÿฌ

SQL Warehouses

serverless ยท pro ยท classic

SQL Warehouses

Scalable compute resources optimized for SQL queries and BI workloads, available as serverless, pro, or classic options.

๐Ÿงช

MLflow

ML lifecycle

MLflow

An open-source platform for managing the machine learning lifecycle, including experiments, reproducibility, models, and deployment.