Understanding KSML
Now that you've seen KSML in action with the Quick Start, let's understand what just happened and why KSML makes stream processing so much simpler.
What You Just Saw
In the Quick Start, you set up a complete stream processing pipeline with just:
- A simple Docker Compose YAML configuration file
- A YAML KSML definition file that describes the Kafka Streams topology
- A YAML KSML configuration file that configures the KSML container
- No Java code, no compilation, no complex setup
That pipeline was:
- Reading temperature messages from a Kafka topic
- Processing each message through your KSML pipeline
- Writing the results to another topic
Behind the scenes, KSML converted your YAML definition into a full Kafka Streams application.
Why KSML Exists
Kafka Streams is at the heart of many organizations' event sourcing, analytical processing, real-time, batch or ML workloads. It has a beautiful Java DSL that allows developers to focus on what their applications need to do, not how to do it. But the Java requirement also holds people back. If you don't know Java, does that mean you cannot use Kafka Streams?
Enter KSML.
KSML is a wrapper language and interpreter around Kafka Streams that lets you express any topology in a simple YAML syntax. Simply define your topology as a processing pipeline with a series of steps that your data passes through. Your custom functions can be expressed inline in Python. KSML reads your definition and constructs the topology dynamically via the Kafka Streams DSL.
What is KSML?
KSML is a declarative language that allows you to build powerful Kafka Streams applications using YAML and Python, without writing Java code. KSML makes stream processing accessible to a wider audience by removing the need for Java development skills and complex build pipelines.
With KSML, you can:
- Define streaming data pipelines using simple YAML syntax
- Process data using Python functions embedded directly in your KSML definitions
- Deploy applications without compiling Java code
- Rapidly prototype and iterate on streaming applications
Why Use KSML?
Simplified Development
KSML dramatically reduces the complexity of building Kafka Streams applications. Instead of writing hundreds of lines of Java code, you can define your streaming logic in a concise YAML file with embedded Python functions.
No Java Required
While Kafka Streams is a powerful Java library, KSML eliminates the need to write Java code. This opens up Kafka Streams to data engineers, analysts, and other professionals who may not have Java expertise.
Rapid Prototyping
KSML allows you to quickly prototype and test streaming applications. Changes can be made to your KSML definition and deployed immediately, without a compile-package-deploy cycle.
Full Access to Kafka Streams Capabilities
KSML provides access to the full power of Kafka Streams, including:
- Stateless operations (map, filter, etc.)
- Stateful operations (aggregate, count, etc.)
- Windowing operations
- Stream and table joins
- And more
Key Concepts and Terminology
Streams
In KSML, a stream represents a flow of data from or to a Kafka topic. Streams are defined with a name, a topic, and data types for keys and values.
Functions
Functions in KSML are reusable pieces of Python code that can be called from your pipelines. They can transform data, filter messages, or perform side effects like logging.
Pipelines
Pipelines define the flow and processing of data. A pipeline starts with a source stream, applies a series of operations, and typically ends with a sink operation that writes to another stream or performs a terminal action.
pipelines:
my_pipeline:
from: my_input_stream
via:
- type: filter
if:
expression: value.get('temperature') > 70
- type: mapValues
mapper:
expression: |
{"temp_celsius": (value.get('temperature') - 32) * 5/9}
to: my_output_stream
Operations
Operations are the building blocks of pipelines. They define how data is processed as it flows through a pipeline. KSML supports a wide range of operations, including:
- Stateless operations: filter, map, flatMap, peek
- Stateful operations: aggregate, count, reduce
- Joining operations: join, leftJoin, outerJoin
- Windowing operations: windowedBy
- Sink operations: to, forEach, branch
How KSML Relates to Kafka Streams
KSML is built on top of Kafka Streams and translates your YAML definitions into Kafka Streams topologies. This means:
- You get all the benefits of Kafka Streams (exactly-once processing, fault tolerance, scalability)
- Your KSML applications can integrate with existing Kafka Streams applications
- Performance characteristics are similar to native Kafka Streams applications
When you run a KSML definition, the KSML runner:
- Parses your YAML definition
- Translates it into a Kafka Streams topology
- Executes the topology using the Kafka Streams runtime
This translation happens automatically, allowing you to focus on your business logic rather than the details of a normal Kafka Streams implementation.
Next Steps
Now that you understand what KSML is and its key concepts, you can:
- Set up Schema Validation in your IDE for better development experience
- Explore the Reference Documentation in more detail
- Follow the KSML Basics Tutorial to build your first application