Models

A model consists of code that populates a single table or view, along with metadata properties that define how your users can interact with the model.

Models define a data processing step and frequently represent some bit of business logic. They can be written in SQL or Python. Models can reference each other, and SBT will automatically build your complete data processing pipeline by parsing the model code for dependencies. A bundle of models powers an app on the Insight Cloud platform.

Models all live in the models/ directory of your SBT project. A model consists of two parts:

  • A definition - a python or sql file that contains the code for that model. The name of the file will become the unique identifier for the model.

  • A configuration - a yaml file that contains metadata for the model. Must have a named config matching the model name from the definition file.

Example

Let's look at a simple example to illustrate how models work.

First we have the directory structure of the project:

# Directory Structure

sbt-project/ 
├── sources/ 
│   └── sources.yml 
└── models/ 
    ├── my_first_model.sql 
    └── configs.yml 

The model definition is a SQL file. It uses standard SQL plus some special syntax to access the Seek Run Context object. We can access sources, variables, and other model outputs using the seek object. All of these variables are compiled at runtime based on the user's request. In this example, we are reading from an expected source table and adding a user-defined parameter to the output.

my_first_model.sql

select  
  src.*,  
  '{seek.params[TEST_VAR]}' as user_param_value 
from {seek.sources[vendor.table]} src 

Next let's look at the model config file. It contains a yaml block like the below, which gives us the config for the model my_first_model. The variables key is a list of variables that users will be able to modify when running your app. The values_query key is a query that returns the possible values for the variable - this is useful for powering dropdowns in Insight Cloud.

configs.yml

models:
  - name: my_first_model
    config:
      language: sql
      materialized: table
      type: output
      variables:
        - name: TEST_VAR
          display_name: Test Variable
          type: VARCHAR
          default: No test var set
          values_query: values_test 

Model Configs

The model config is contained in a yaml file. The filename doesn't matter, you can create multiple yaml files in nested directories if that suits your needs, or you can have one centralized yaml that holds all your model configs. SBT will search all yaml files for models: keys to find a config that matches the model definition.:

Model Object Fields

Field

Description

Available Values

name

The name of the model, must be unique. Used as the name of tables created by this model.

string, must be unique

config.packages

A list of python packages that need to be installed in the app environment to run this model.

list of strings, typical pip version pinning is supported

config.materialized

The type of db object to create in snowflake

table or view

config.language

The language of the model.

python or sql

config.type

The type of model.

output or mapping

config.variables

A list of model variables that can be set when running the model.

list of objects

config.py_modules

A list of python wheel files that need to be installed in the snowflake environment to run this model. These should be located in the modules directory to be included with model deployments

list of strings

Model Variable Fields

Field

Description

Available Values

name

The name of the variable

string

display_name

The name of the variable as it will be displayed to users

string

type

The snowflake data type of the variable

string

default

The default value of the variable

string

values_query

A query that returns the possible values for the variable

string

models:
  - name: python_test
    config:
    packages:
      - "snowflake-snowpark-python"
      - "numpy"
      - "scikit-learn"
      - "pandas"
    materialized: table
    language: python
    type: output
    variables:
      - name: TEST_VAR
        display_name: Test Variable
        type: VARCHAR
        default: No test var set
        values_query: values_test
     py_modules:
       - seek_cpg.whl

Seek Run Context

All of the magic of SBT happens via the Seek Run Context object. Models have access to an object called seek which should be used to interact with the Seek platform. The seek object has properties for accessing sources, params, other models, and other current run context.

class SeekRunCtx(BaseModel): 
    model_name: str 
    target_name: str 
    current_db: str 
    params: dict 
    sources: dict 
    models: dict 

Keep reading to see how to use the seek object in your models.

Model Defintion

SQL Models

SQL models are select statements stored in a .sql file. The SQL query defined in the file will eventually be wrapped in CREATE TABLE/VIEW AS SELECT statements. Wrap any references to the SeekRunContext in single curly braces - remember this is going to end up getting formatted in python.

Note

For SQL models only - don't wrap the name of the source/model/param you are accessing in quotes. The below example is correctly not quoting atlas_luminate.atlas_itemlist_custom

SELECT * EXCLUDE EPOCH
       RENAME "ITEM NBR" AS item_nbr
FROM {seek.sources[atlas_luminate.atlas_itemlist_custom]} 

Python Models

Python models are regular python .py files. They must contain at least one function called model which takes two arguments, session and seek. The session arg is a snowpark session used for interacting with snowflake. The seek arg is the SeekRunContext object. The model function should return a data frame - could be snowpark or pandas. There can be as many helper functions as you like in the file, but the model function is required.

Below is an example python model definition. This model reads from a source, does some transformations, and writes to an output table.

from snowflake.snowpark.functions import lit 

def model(session, seek): 
    # Accessing seek sources works 
    base_table = session.table(seek.sources["atlas.view_items"]) 
    pivot_values = base_table.select("Brand Desc").distinct() 
    # Accessing defined params works 
    var_df = pivot_values.with_column("vars", lit(seek.params["TEST_VAR"])) 
    # Accessing seek object properties works 
    other_df = var_df.with_column("database", lit(seek.model_name)) 
    # Casting to pandas works 
    pandas_df = other_df.to_pandas() 
    # Return table gets written to output table 
    snowpark_df = session.createDataFrame(pandas_df) 
    return snowpark_df