Columns

Last updated 1 month ago

Columns

Casting Columns

When running a replication, you can specify the column types to tell Sling to cast the data to the correct type. It is not necessary to include all columns. Sling will automatically detect the types for any unspecified columns. See and for various type mappings between native and generic types for all types of databases.

Acceptable data types are:

bigint
bool
datetime
decimal or decimal(precision, scale) (such as decimal(10, 2))
integer
json
string or string(length) (such as string(100))
text or text(length) (such as text(3000))

Sling allows you the ability to apply constraints, such as value > 0. See for details.

Using CLI Flags

# template
sling run --columns '{ "<column_name>": "<data_type>" }'

# template with constraint
sling run --columns '{ "<column_name>": "<data_type> | <constraint>" }'

# cast column to bigint
sling run --columns '{ "my_column": "bigint" }'

# cast column to decimal(10, 2)
sling run --columns '{ "my_column": "decimal(10, 2)" }'

# cast multiple columns
sling run --columns '{ "my_column": "string(150)", "my_other_column": "decimal(10, 2)" }'

# cast all columns to string
sling run --columns '{ "*": "string" }'

# apply constraint, value should be greater than 0
sling run --columns '{ "my_column": "integert | value > 0" }'

Using YAML

Using the defaults and streams keys, you can specify different columns for each stream.

source: source_name
target: target_name

defaults:
  # apply to all streams by default
  columns:
    id: bigint

streams:
  # inherit defaults
  my_stream:

  my_other_stream:
    # apply to this stream only (overrides defaults)
    columns:
      id: bigint
      my_column: string(150)
      my_other_column: decimal(10, 2)
      my_int: int | value > 0   # apply constraint, value should be greater than 0

  # cast all columns to string
  another_stream:
    columns:
      "*": string

Column Casing

Available Casing Options

normalize - Normalize column names to target database's default casing (upper or lower case), but preserve mixed-case column names. This helps with querying tables without needing quotes for standard column names.
source - Keep the original casing from the source data.
target - Convert all column names according to the target database's default casing (upper case for Oracle, lower case for PostgreSQL, etc.).
snake - Convert camelCase and other formats to snake_case, then apply the target database's default casing.
upper - Convert all column names to UPPER CASE.
lower - Convert all column names to lower case.

Examples

Assuming source column names: customerId, first_name, LAST_NAME, email-address

Using CLI Flag

# Keep source casing
sling run --tgt-options '{"column_casing": "source"}'

# Use target database casing
sling run --tgt-options '{"column_casing": "target"}'

# Convert to snake case with target database casing
sling run --tgt-options '{"column_casing": "snake"}'

# Convert to upper case
sling run --tgt-options '{"column_casing": "upper"}'

# Convert to lower case
sling run --tgt-options '{"column_casing": "lower"}'

Using YAML

source: mysql.customer_data
target: postgres.public.customers

defaults:
  mode: full-refresh
  target_options:
    column_casing: snake

Example Results

Let's see how each option transforms our sample column names for a PostgreSQL, DuckDB or MySQL target (which defaults to lowercase):

Original Column

source

normalize

target

snake

upper

lower

customerId

customerid

customer_id

CUSTOMERID

customerid

first_name

FIRST_NAME

first_name

LAST_NAME

last_name

LAST_NAME

last_name

email-address

email_address

EMAIL_ADDRESS

email_address

For an Oracle or Snowflake target (which defaults to uppercase):

Original Column

source

normalize

target

snake

upper

lower

customerId

CUSTOMERID

CUSTOMER_ID

CUSTOMERID

customerid

first_name

FIRST_NAME

first_name

LAST_NAME

last_name

email-address

EMAIL_ADDRESS

email_address

This functionality makes it easier to work with column names when moving data between systems with different naming conventions or case sensitivity requirements.

Column Typing

Structure

The column_typing configuration has the following structure:

target_options:
  column_typing:
    string:
      length_factor: <int>
      min_length: <int>
      max_length: <int>
      use_max: <bool>

    decimal:
      min_precision: <int>
      max_precision: <int>
      min_scale: <int>
      max_scale: <int>

Where:

string: Settings for string type columns
- length_factor: A multiplier applied to the detected length of string columns (default: 1)
- min_length: The minimum length to use for string columns (if specified)
- max_length: The maximum length to use for string columns (if specified)
- use_max: Whether to always use the max_length value instead of calculated lengths (default: false)
decimal: Settings for decimal type columns
- min_precision: The minimum total number of digits (precision) for decimal columns.
- max_precision: The maximum total number of digits (precision) for decimal columns.
- min_scale: The minimum number of digits after the decimal point (scale).
- max_scale: The maximum number of digits after the decimal point (scale).

How It Works

When Sling creates tables in the target database, it analyzes the source data to determine appropriate column types.

For string columns:

Sling determines the maximum string length from the source data
If length_factor is specified, this value is multiplied by the factor
If max_length is specified and use_max is false, the length is capped at this value
If use_max is true, max_length is used regardless of the calculated length
If min_length is specified, the length will be at least that number

For decimal columns:

Sling determines the required precision and scale based on the source data.
If column_typing.decimal settings are provided, Sling adjusts the calculated precision and scale based on the min_precision, max_precision, min_scale, and max_scale values.
The final precision and scale are used to generate the decimal(precision, scale) type in the target database DDL.

This helps prevent truncation issues when moving data between systems with different character encoding requirements or different decimal precision/scale needs.

Examples

Double String Column Lengths

source: mssql
target: redshift

defaults:
  mode: truncate
  object: public.{stream_table}
  
streams:
  dbo.test_sling_unicode:
    target_options:
      column_typing:
        string:
          length_factor: 2

In this example, all string columns in the dbo.test_sling_unicode table will have their length doubled when created in the PostgreSQL target. This is useful when moving from a database that uses single-byte encoding to one that uses multi-byte encoding (like UTF-8).

Set Maximum String Length

source: mysql
target: oracle

defaults:
  target_options:
    column_typing:
      string:
        max_length: 8000

This example sets a maximum length of 8000 characters for all string columns across all streams, which is useful for databases with column size limitations.

Use Fixed String Length

streams:
  sales.customers:
    target_options:
      column_typing:
        string:
          max_length: 1000
          use_max: true

This configuration forces all string columns in the sales.customers table to use a fixed length of 1000, regardless of the actual data length.

PreviousTarget Options NextTransforms

Last updated 1 month ago

Casting Columns

Acceptable data types are:

bigint
bool
datetime
decimal or decimal(precision, scale) (such as decimal(10, 2))
integer
json
string or string(length) (such as string(100))
text or text(length) (such as text(3000))

Sling allows you the ability to apply constraints, such as value > 0. See for details.

Using CLI Flags

# template
sling run --columns '{ "<column_name>": "<data_type>" }'

# template with constraint
sling run --columns '{ "<column_name>": "<data_type> | <constraint>" }'

# cast column to bigint
sling run --columns '{ "my_column": "bigint" }'

# cast column to decimal(10, 2)
sling run --columns '{ "my_column": "decimal(10, 2)" }'

# cast multiple columns
sling run --columns '{ "my_column": "string(150)", "my_other_column": "decimal(10, 2)" }'

# cast all columns to string
sling run --columns '{ "*": "string" }'

# apply constraint, value should be greater than 0
sling run --columns '{ "my_column": "integert | value > 0" }'

Using YAML

Using the defaults and streams keys, you can specify different columns for each stream.

source: source_name
target: target_name

defaults:
  # apply to all streams by default
  columns:
    id: bigint

streams:
  # inherit defaults
  my_stream:

  my_other_stream:
    # apply to this stream only (overrides defaults)
    columns:
      id: bigint
      my_column: string(150)
      my_other_column: decimal(10, 2)
      my_int: int | value > 0   # apply constraint, value should be greater than 0

  # cast all columns to string
  another_stream:
    columns:
      "*": string

Column Casing

The column_casing allows you to control how column names are formatted when creating tables in the target database. This is useful for ensuring consistent naming conventions and avoiding the need to use quotes when querying tables in databases with case-sensitive identifiers.

Starting in v1.4.5, the default is normalize. Before this version, the default was source. See note for details.

Available Casing Options

normalize - Normalize column names to target database's default casing (upper or lower case), but preserve mixed-case column names. This helps with querying tables without needing quotes for standard column names.
source - Keep the original casing from the source data.
target - Convert all column names according to the target database's default casing (upper case for Oracle, lower case for PostgreSQL, etc.).
snake - Convert camelCase and other formats to snake_case, then apply the target database's default casing.
upper - Convert all column names to UPPER CASE.
lower - Convert all column names to lower case.

Examples

Assuming source column names: customerId, first_name, LAST_NAME, email-address

Using CLI Flag

# Keep source casing
sling run --tgt-options '{"column_casing": "source"}'

# Use target database casing
sling run --tgt-options '{"column_casing": "target"}'

# Convert to snake case with target database casing
sling run --tgt-options '{"column_casing": "snake"}'

# Convert to upper case
sling run --tgt-options '{"column_casing": "upper"}'

# Convert to lower case
sling run --tgt-options '{"column_casing": "lower"}'

Using YAML

source: mysql.customer_data
target: postgres.public.customers

defaults:
  mode: full-refresh
  target_options:
    column_casing: snake

Example Results

Let's see how each option transforms our sample column names for a PostgreSQL, DuckDB or MySQL target (which defaults to lowercase):

Original Column

source

normalize

target

snake

upper

lower

customerId

customerid

customer_id

CUSTOMERID

customerid

first_name

FIRST_NAME

first_name

LAST_NAME

last_name

LAST_NAME

last_name

email-address

email_address

EMAIL_ADDRESS

email_address

For an Oracle or Snowflake target (which defaults to uppercase):

Original Column

source

normalize

target

snake

upper

lower

customerId

CUSTOMERID

CUSTOMER_ID

CUSTOMERID

customerid

first_name

FIRST_NAME

first_name

LAST_NAME

last_name

email-address

EMAIL_ADDRESS

email_address

This functionality makes it easier to work with column names when moving data between systems with different naming conventions or case sensitivity requirements.

Column Typing

Starting in v1.4.5, the column_typing allows you to configure how Sling generates column types when creating tables in the target database. This is particularly useful when you need to ensure string columns have sufficient length to accommodate all possible values, especially when dealing with different database systems or character encodings.

Structure

The column_typing configuration has the following structure:

target_options:
  column_typing:
    string:
      length_factor: <int>
      min_length: <int>
      max_length: <int>
      use_max: <bool>

    decimal:
      min_precision: <int>
      max_precision: <int>
      min_scale: <int>
      max_scale: <int>

Where:

string: Settings for string type columns
- length_factor: A multiplier applied to the detected length of string columns (default: 1)
- min_length: The minimum length to use for string columns (if specified)
- max_length: The maximum length to use for string columns (if specified)
- use_max: Whether to always use the max_length value instead of calculated lengths (default: false)
decimal: Settings for decimal type columns
- min_precision: The minimum total number of digits (precision) for decimal columns.
- max_precision: The maximum total number of digits (precision) for decimal columns.
- min_scale: The minimum number of digits after the decimal point (scale).
- max_scale: The maximum number of digits after the decimal point (scale).

How It Works

When Sling creates tables in the target database, it analyzes the source data to determine appropriate column types.

For string columns:

Sling determines the maximum string length from the source data
If length_factor is specified, this value is multiplied by the factor
If max_length is specified and use_max is false, the length is capped at this value
If use_max is true, max_length is used regardless of the calculated length
If min_length is specified, the length will be at least that number

For decimal columns:

Sling determines the required precision and scale based on the source data.
If column_typing.decimal settings are provided, Sling adjusts the calculated precision and scale based on the min_precision, max_precision, min_scale, and max_scale values.
The final precision and scale are used to generate the decimal(precision, scale) type in the target database DDL.

This helps prevent truncation issues when moving data between systems with different character encoding requirements or different decimal precision/scale needs.

Examples

Double String Column Lengths

source: mssql
target: redshift

defaults:
  mode: truncate
  object: public.{stream_table}
  
streams:
  dbo.test_sling_unicode:
    target_options:
      column_typing:
        string:
          length_factor: 2

Set Maximum String Length

source: mysql
target: oracle

defaults:
  target_options:
    column_typing:
      string:
        max_length: 8000

This example sets a maximum length of 8000 characters for all string columns across all streams, which is useful for databases with column size limitations.

Use Fixed String Length

streams:
  sales.customers:
    target_options:
      column_typing:
        string:
          max_length: 1000
          use_max: true

This configuration forces all string columns in the sales.customers table to use a fixed length of 1000, regardless of the actual data length.