94 Matching Annotations
  1. Dec 2023
    1. erve_traits Entity Traits 360 consolidates features and traits of an entity from various entity variables and ML models into a unified view. type: list properties id_served List of all the ID types that will be served. type list

      Either complete it or remove it.

    1. ars Dependency List Note This is applicable to schema 42. With introduction of entity vars as models in schema 44, dependency is done away with. While defining vars in feature_table (entity_var and input_var), it is advisable to list dependencies of each var in the yaml. This helps the system figure out the dependency chain of vars which helps us only create the features that are

      Can remove this

    1. For example, from your ID stitching model if main_id is the stitched identifier of a user entity, then each row in feature table will have this as first column and will look like something this:

      Add a section on entity var table. This table, as shown below, is a var_table which brings together all features defined. across all vars in var_groups.

    2. D-Stitching table name: This is supposed to be the name of the id stitcher model from the same YAML. The stitched unique ID from this table is used as identifier for the traits of entities of same type in feature t

      Users don't need to know this. Only the entity for which they are creating and the input tables that would be used in creating them.

    1. Create a table or view of features from different PB models, using specified ID as a key (such as email, userID, etc). So, a single unified view can have all entity vars, Python models, etc.

      Consider that features/traits of an entity are spread across multiple entity vars and ML models. Entity Traits 360 is a means to get them together into a single view. These models are usually defined in pb_project.yaml by creating entries under serve_traits key with corresponding entity. SQL template models can be used as an input to an entity-var/ input-var or as an edge-source in id-stitcher.

    2. package lib_ft # app_project imports lib_ft - inputs - tbl_a # Input tables assumed in the imported library package - tbl_b - tbl_c - models - id_stitcher # Id stitcher table from the imported package - feature_table # Feature table defined in the imported package - inputs - tbl_a' # Input tables a' has to be defined in app project with table names as in the warehouse and other details as in package. Then input a has to be remapped to a'. - tbl_b'

      Remove this

    3. Feature Table - This allows you to create output tables with computations such as aggregation, average, etc. on selected field

      Mark it as legacy. Warn that it is now a view and not a table. Move to the end of the list.

    1. The file models/profiles.yaml does not includ

      Cover how we specify id stitcher edges. Not how we don't.

      Need to add a section on how any column mentioned in ids is automatically sent to the Id Stitcher. Unless, overridden using to_default_stitcher; false

      Mention that we do it because Id Stitcher edges can be defined across multipe projects this way.

      Also, that this can be done inputs, SQL models, pynative models as well.

  2. Oct 2022
    1. model_folders: - models # Name of subfolder inside your project folder, which contains YAML model files # Model folders to use. model_folders: - models

      model_folders is duplicated.

    1. want to create id stitcher and feature table.

      Entity has an existence irrespective of whether we want to create it for id stitcher and feature table or not.

      Repharse to "Name of the entity"

    1. ut: It is the name of input table defines in

      1) Grammar is incorrect. 2) Edge sources are refs. Usually, they ref to inputs liusted in inputs.yaml. 3) We need a common place to put explanation of refs.

    1. : blacklistDomains filter: (context_sources_job_run_id = (select top 1 context_sources_job_run_id from blacklistDomains order by timestamp desc)) value: max(case when exclude_reason is not null then 1 else 0 end) - tablevar: name: domainSummary_account_type ref: input: domainSummary value: domainSummary.account_type - feature: name: ignore_domain value: max(case when blacklistFlag = 1 or domainSummary_account_type like '%free%' then 1 else 0 end) description: Whether a domain should be ignored for the analysis - macro: name: call_conversion value: type = 'Call' or tasksubtype='Call' or type like '%demo%') and \ (rlike(lower(subject), '.*\\bcall\\b.*') or rlike(lower(subject), '.*\\bdemo\\b.*' - tablevar:

      This is too long. Keep only a representative subset of features.

    2. In addition to the ones in Identity Stitching, a Feature Table YAML contains the following keys: features

      Document schema similar to id stitcher. Additional sections for tabklevar, tablefeature etc can also be put.

    1. - type: exclude value: "na" rudder_anon_id: email: filters: - type: include regex: "[A-Za-z0-9+_.-]+@(.+)" inputs: # For every input model, we need to know the column expression which gives identity of each row in the column. - name: websitePageVisits ref: { table: autotrack.pages, timestamp_col: timestamp } ids: - sql: "anonymous_id" type: rudder_anon_id - sql: "case when lower(coalesce(context_traits_email, user_id)) like any ('%gmail%', '%yahoo%') then lower(coalesce(context_traits_email, user_id)) \ else split_part(lower(coalesce(context_traits_email, user_id)),'@',2) end" type: emai

      This syntax is also old.

    2. models/id_stitcher.yaml - Download this file from here and customise these fields: validity_time - Fetch data which was loaded upto this timestamp. types - The type of data that can come via ID Stitching. filters - Filter(s) that the type should go through before being included. Filters are processed in order. Current filters enable one to include / exclude specific values or regexes. inputs - List of all input sources along with SQL for fetching them. name - Give a name in which you will store the fetched data. ref - Table in DW you are fetching from, along with time

      This is old. Needs to be fixed. We are not downloading files from anywhere now.

  3. Sep 2022
    1. After

      Is this accurate? After you create a WHT project, you should be able to see it in the list of all WHT projects. From there, if you click on one, you should be able to see all its runs. From the runs, you should be able to click on any one run and see artefacts. Please correct me if I am wrong.

    1. wht run does a check of whether the used role have access to read from all the input tables, write to schema and common in order to store the output of material and other information. If the privileges assigned to the role are suffecient th

      1) This section is not specific to any waerhouse. It should be out of the tab. 2) Give a link to wht run documentation in CLI Reference page. 3) Add a line on wht validate access command(with link to CLI reference page) also.

    1. ny user that wants to run the project shoul

      "You have to create a new role which is a union of both read and write privileged roles. Any user that wants to run the project shoul...." --> "TLDR; You need to create an additional role specified as a union of the two roles. WHT runs need to be able to read the input tables and write results back to a schema in the warehouse. Furthermore, each run is executed using a single role, specified in the matching profile's section of the site config. It is best in terms of security to create a new role which has read access to all relevant inputs and write access to the output schema. Alternative is to reuse an existing role which has atleast those permissions.

    2. llowing section describes how you can debug in case you want to troubleshoot the execution of data models.

      Can you divide the page into a few sections for better readability. Keep your query under Access Control

  4. Aug 2022
    1. have their own access control mechanisms, the model(DAC/RBAC) used for access control depends on the warehouse also. Snowflake uses a combination of DAC and R

      Can be moved below WHT Projects/Models and Features