Sling
Slingdata.ioBlogGithubHelp!
  • Introduction
  • Sling CLI
    • Installation
    • Environment
    • Running Sling
    • Global Variables
    • CLI Pro
  • Sling Platform
    • Sling Platform
      • Architecture
      • Agents
      • Connections
      • Editor
      • API
      • Deploy from CLI
  • Concepts
    • Replications
      • Structure
      • Modes
      • Source Options
      • Target Options
      • Columns
      • Transforms
      • Runtime Variables
      • Tags & Wildcards
    • Hooks / Steps
      • Check
      • Command
      • Copy
      • Delete
      • Group
      • Http
      • Inspect
      • List
      • Log
      • Query
      • Replication
      • Store
      • Read
      • Write
    • Pipelines
    • Data Quality
      • Constraints
  • Examples
    • File to Database
      • Custom SQL
      • Incremental
    • Database to Database
      • Custom SQL
      • Incremental
      • Backfill
    • Database to File
      • Incremental
    • Sling + Python 🚀
  • Connections
    • Database Connections
      • Athena
      • BigTable
      • BigQuery
      • Cloudflare D1
      • Clickhouse
      • DuckDB
      • DuckLake
      • MotherDuck
      • MariaDB
      • MongoDB
      • Elasticsearch
      • MySQL
      • Oracle
      • Postgres
      • Prometheus
      • Proton
      • Redshift
      • StarRocks
      • SQLite
      • SQL Server
      • Snowflake
      • Trino
    • Storage Connections
      • AWS S3
      • Azure Storage
      • Backblaze B2
      • Cloudflare R2
      • DigitalOcean Spaces
      • FTP
      • Google Drive
      • Google Storage
      • Local Storage
      • Min.IO
      • SFTP
      • Wasabi
Powered by GitBook
On this page
  • Configuration
  • Properties
  • Output
  • Examples
  • Basic Row Count Validation
  • Multiple Condition Check
  • Data Quality Threshold Check
  • Time Window Validation
  • Complex Business Rule Validation
  • Environment-Based Validation
  • Resource Usage Check
  • Data Freshness Check
  1. Concepts
  2. Hooks / Steps

Check

Check hooks allow you to validate conditions and control the flow of your replication process. They are useful for implementing data quality checks, validating prerequisites, and ensuring business rules are met.

Configuration

- type: check
  check: "run.total_rows > threshold"  # Required: The condition to evaluate
  failure_message: '{run.total_rows} is below threshold'  # Optional: the message to use as an error
  vars:                       # Optional: Local variables for the check
    threshold: 1000
    min_date: "2023-01-01"
  on_failure: abort          # Optional: abort/warn/quiet/skip
  id: my_id                  # Optional. Will be generated. Use `log` hook with {runtime_state} to view state.

Properties

Property
Required
Description

check

Yes

The condition to evaluate

failure_message

No

A Message to use as the error if check fails

vars

No

Map of scoped variables that can be used in the check

on_failure

No

What to do if the check fails (abort/warn/quiet/skip)

Output

When the check hook executes successfully, it returns the following output that can be accessed in subsequent hooks:

status: success  # Status of the hook execution
failure_message: message  # The rendered message
result: true     # The result of the check evaluation (true/false)

You can access these values in subsequent hooks using the following syntax (jmespath):

  • {state.hook_id.check} - the compiled expresion to check

  • {state.hook_id.status} - Status of the hook execution

  • {state.hook_id.result} - Boolean result of the check

Examples

Basic Row Count Validation

Ensure that the replication processed a minimum number of rows:

hooks:
  post:
    - type: check
      check: "run.total_rows >= min_rows"
      vars:
        min_rows: 100
      on_failure: abort

Multiple Condition Check

Validate multiple conditions before starting replication:

hooks:
  pre:
    - type: check
      check: |
        run.stream.schema != '' && 
        run.object.schema != '' && 
        timestamp.hour >= 1 && 
        timestamp.hour <= 23
      on_failure: abort

Data Quality Threshold Check

Verify that the error rate in processed data is below a threshold:

hooks:
  post:
    - type: check
      check: |
        state.quality_check.result.error_rate <= max_error_rate
      vars:
        max_error_rate: 0.01  # 1% error rate threshold
      on_failure: warn

Time Window Validation

Ensure replication runs within specific time windows:

hooks:
  pre:
    - type: check
      check: |
        (timestamp.hour >= start_hour && 
         timestamp.hour <= end_hour) ||
        (timestamp.day_name in allowed_days)
      vars:
        start_hour: 20  # 8 PM
        end_hour: 6    # 6 AM
        allowed_days: ["Saturday", "Sunday"]
      on_failure: skip

Complex Business Rule Validation

Implement complex business rules with multiple conditions:

hooks:
  post:
    - type: check
      check: |
        (run.total_rows >= min_rows && 
         run.total_rows <= max_rows) &&
        (run.duration <= max_duration) &&
        (state.data_check.result.null_percentage <= max_null_percent)
      vars:
        min_rows: 1000
        max_rows: 1000000
        max_duration: 3600  # 1 hour
        max_null_percent: 5
      on_failure: abort

Environment-Based Validation

Apply different validation rules based on the environment:

hooks:
  pre:
    - type: check
      check: |
        (env.ENVIRONMENT == 'production' AND 
         run.stream.name IN prod_allowed_streams) OR
        (env.ENVIRONMENT != 'production')
      vars:
        prod_allowed_streams: ["customers", "orders", "products"]
      on_failure: abort

Resource Usage Check

Validate system resource availability before proceeding:

hooks:
  pre:
    - type: check
      check: |
        state.resource_check.result.available_disk_space >= min_disk_space &&
        state.resource_check.result.available_memory >= min_memory
      vars:
        min_disk_space: 10737418240  # 10GB in bytes
        min_memory: 4294967296      # 4GB in bytes
      on_failure: warn

Data Freshness Check

Ensure source data is fresh enough before replication:

hooks:
  pre:
    - type: check
      check: |
        state.freshness_check.result.last_update_time >= 
        timestamp.unix - max_age_seconds
      vars:
        max_age_seconds: 3600  # 1 hour
      on_failure: skip
PreviousHooks / StepsNextCommand

Last updated 1 month ago