Models
A model consists of code that populates a single table or view, along with metadata properties that define how your users can interact with the model.
Models define a data processing step and frequently represent some bit of business logic. They can be written in SQL or Python. Models can reference each other, and SBT will automatically build your complete data processing pipeline by parsing the model code for dependencies. A bundle of models powers an app on the Insight Cloud platform.
Models all live in the models/ directory of your SBT project. A model consists of two parts:
A definition - a python or sql file that contains the code for that model. The name of the file will become the unique identifier for the model.
A configuration - a yaml file that contains metadata for the model. Must have a named config matching the model name from the definition file.
Example
Let's look at a simple example to illustrate how models work.
First we have the directory structure of the project:
# Directory Structure
sbt-project/
├── sources/
│ └── sources.yml
└── models/
├── my_first_model.sql
└── configs.yml The model definition is a SQL file. It uses standard SQL plus some special syntax to access the Seek Run Context object. We can access sources, variables, and other model outputs using the seek object. All of these variables are compiled at runtime based on the user's request. In this example, we are reading from an expected source table and adding a user-defined parameter to the output.
my_first_model.sql
select
src.*,
'{seek.params[TEST_VAR]}' as user_param_value
from {seek.sources[vendor.table]} src Next let's look at the model config file. It contains a yaml block like the below, which gives us the config for the model my_first_model. The variables key is a list of variables that users will be able to modify when running your app. The values_query key is a query that returns the possible values for the variable - this is useful for powering dropdowns in Insight Cloud.
configs.yml
models:
- name: my_first_model
config:
language: sql
materialized: table
type: output
variables:
- name: TEST_VAR
display_name: Test Variable
type: VARCHAR
default: No test var set
values_query: values_test Model Configs
The model config is contained in a yaml file. The filename doesn't matter, you can create multiple yaml files in nested directories if that suits your needs, or you can have one centralized yaml that holds all your model configs. SBT will search all yaml files for models: keys to find a config that matches the model definition.:
Model Object Fields
Model Variable Fields
models:
- name: python_test
config:
packages:
- "snowflake-snowpark-python"
- "numpy"
- "scikit-learn"
- "pandas"
materialized: table
language: python
type: output
variables:
- name: TEST_VAR
display_name: Test Variable
type: VARCHAR
default: No test var set
values_query: values_test
py_modules:
- seek_cpg.whlSeek Run Context
All of the magic of SBT happens via the Seek Run Context object. Models have access to an object called seek which should be used to interact with the Seek platform. The seek object has properties for accessing sources, params, other models, and other current run context.
class SeekRunCtx(BaseModel):
model_name: str
target_name: str
current_db: str
params: dict
sources: dict
models: dict Keep reading to see how to use the seek object in your models.
Model Defintion
SQL Models
SQL models are select statements stored in a .sql file. The SQL query defined in the file will eventually be wrapped in CREATE TABLE/VIEW AS SELECT statements. Wrap any references to the SeekRunContext in single curly braces - remember this is going to end up getting formatted in python.
Note
For SQL models only - don't wrap the name of the source/model/param you are accessing in quotes. The below example is correctly not quoting atlas_luminate.atlas_itemlist_custom
SELECT * EXCLUDE EPOCH
RENAME "ITEM NBR" AS item_nbr
FROM {seek.sources[atlas_luminate.atlas_itemlist_custom]} Python Models
Python models are regular python .py files. They must contain at least one function called model which takes two arguments, session and seek. The session arg is a snowpark session used for interacting with snowflake. The seek arg is the SeekRunContext object. The model function should return a data frame - could be snowpark or pandas. There can be as many helper functions as you like in the file, but the model function is required.
Below is an example python model definition. This model reads from a source, does some transformations, and writes to an output table.
from snowflake.snowpark.functions import lit
def model(session, seek):
# Accessing seek sources works
base_table = session.table(seek.sources["atlas.view_items"])
pivot_values = base_table.select("Brand Desc").distinct()
# Accessing defined params works
var_df = pivot_values.with_column("vars", lit(seek.params["TEST_VAR"]))
# Accessing seek object properties works
other_df = var_df.with_column("database", lit(seek.model_name))
# Casting to pandas works
pandas_df = other_df.to_pandas()
# Return table gets written to output table
snowpark_df = session.createDataFrame(pandas_df)
return snowpark_df