UDA (Unified Data Architecture) at Netflix - Summary
Core Problem & Motivation
-
Netflix's systems model core business concepts (actor, movie) in isolation across multiple platforms, creating fragmentation
"Core business concepts like 'actor' or 'movie' are modeled in many places: in our Enterprise GraphQL Gateway powering internal apps, in our asset management platform storing media assets, in our media computing platform that powers encoding pipelines"
-
Key challenges include duplicated/inconsistent models, inconsistent terminology, data quality issues, and limited connectivity
"While identifiers and foreign keys exist, they are inconsistently modeled and poorly documented, requiring manual work from domain experts to find and fix any data issues"
UDA Overview
-
UDA enables teams to model domains once and represent them consistently across systems
"UDA (Unified Data Architecture) is the foundation for connected data in Content Engineering. It enables teams to model domains once and represent them consistently across systems — powering automation, discoverability, and semantic interoperability"
-
Core capabilities: register/connect domain models, catalog/map to data containers, transpile to schema languages (GraphQL, Avro, SQL, RDF, Java), move data between containers, discover via search/graph traversal, introspect using Java/GraphQL/SPARQL
"Transpile domain models into schema definition languages like GraphQL, Avro, SQL, RDF, and Java, while preserving semantics"
Knowledge Graph Foundation
-
UDA built on RDF and SHACL as knowledge graph foundation, addressing enterprise-scale challenges
"We chose RDF and SHACL as the foundation for UDA's knowledge graph"
-
Challenges addressed: RDF lacked usable information model, SHACL inadequate for enterprise data with local schemas/typed keys, teams lacked shared authoring practices, ontology tooling insufficient for collaborative modeling
"SHACL is not a modeling language for enterprise data. Designed to validate native RDF, SHACL assumes globally unique URIs and a single data graph. But enterprise data is structured around local schemas and typed keys, as in GraphQL, Avro, or SQL"
-
Solution uses named-graph-first information model where each named graph conforms to governing model
"UDA adopts a named-graph-first information model. Each named graph conforms to a governing model, itself a named graph in the knowledge graph"
Upper Metamodel
-
Upper is domain modeling language defining classes of keyed entities, attributes, and relationships
"Upper is a language for formally describing domains — business or system — and their concepts"
-
Domain models expressed as conceptual RDF in named graphs, making them introspectable, queryable, versionable
"Upper domain models are data. They are expressed as conceptual RDF and organized into named graphs, making them introspectable, queryable, and versionable within the UDA knowledge graph"
-
Upper is self-referencing (models itself), self-describing (defines concept of domain model), self-validating (conforms to own model)
"Upper is the metamodel for Connected Data in UDA — the model for all models. It is designed as a bootstrapping upper ontology, which means that Upper is self-referencing, because it models itself as a domain model; self-describing, because it defines the very concept of a domain model; and self-validating, because it conforms to its own model"
-
Upper projected into Jena-based Java API and GraphQL schema federated into Enterprise GraphQL gateway
"Upper itself is projected into a generated Jena-based Java API and GraphQL schema used in GraphQL service federated into Netflix's Enterprise GraphQL gateway"
Data Container Representations
-
Data containers are repositories containing instance data conforming to schema languages: GraphQL entities, Avro records, Iceberg tables, Java objects
"They contain instance data that conform to their own schema languages or type systems: federated entities from GraphQL services, Avro records from Data Mesh sources, rows from Iceberg tables, or objects from Java APIs"
-
Representations are faithful graph interpretations of data system members
"Data container representations are data. They are faithful interpretations of the members of data systems as graph data"
-
UDA catalogs only semantically connected assets to domain models, unlike traditional catalogs
"unlike a traditional catalog, it only tracks assets that are semantically connected to domain models"
Mappings
-
Mappings connect domain model elements to data container representation nodes
"Mappings are data that connect domain models to data containers"
-
Enable bidirectional discovery: from domain concept to materialized location, or from container to domain concepts
"Starting from a domain concept, users and systems can walk the knowledge graph to find where that concept is materialized — in which data system, in which container, and even how a specific attribute or relationship is physically accessed. The inverse is also supported"
-
Address semantic integration gaps in existing schema languages (e.g., Avro lacks foreign key representation)
"A trivial example of this could be seen in the lack of built-in facilities in Avro to represent foreign keys, making it very hard to express how entities relate across Data Mesh sources"
-
Enable intent-based automation for data movement while preserving semantics
"Because Mappings encode both meaning and location, UDA can reason about how data should move, preserving semantics, without requiring the consumer to specify how it should be done"
Projections
-
Projections produce concrete data containers implementing domain model characteristics
"A projection produces a concrete data container. These containers, such as a GraphQL schema or a Data Mesh source, implement the characteristics derived from a registered domain model"
-
Ensure semantic interoperability through concrete realization of Upper's denotational semantics
"Each projection is a concrete realization of Upper's denotational semantics, ensuring semantic interoperability across all containers projected from the same domain model"
-
Support transpilation to GraphQL (with federation support) and Avro (Data Mesh flavor) schemas
"UDA currently supports transpilation to GraphQL and Avro schemas"
-
Some projections auto-populate containers (Iceberg Tables), others require manual population (GraphQL APIs, Data Mesh sources)
"Conversely, other containers, like Iceberg Tables, are automatically created and populated by UDA"
-
UDA automatically generates/manages mappings for projected containers
"UDA automatically generates and manages mappings between the newly created data containers and the projected domain model"
Production System: Primary Data Management (PDM)
-
PDM provides single platform for business users to manage controlled vocabularies and reference data
"Primary Data Management (PDM) is a single place where business users can manage controlled vocabularies"
-
Built on W3C SKOS (Simple Knowledge Organization System) standard for modeling knowledge
"PDM uses the Simple Knowledge Organization System (SKOS) model. It is a W3C data standard designed for modeling knowledge"
-
Takes domain model as input, derives UI, provisions Domain Graph Service via UDA-projected GraphQL schema
"PDM builds a user interface based upon the model definition and leverages UDA to project this model into type-safe interfaces for other systems to use"
-
UDA provisions data movement pipelines using Avro projections to feed GraphSearch and warehouse
"UDA is also used to provision data movement pipelines which are able to feed our GraphSearch infrastructure as well as move data into the warehouse"
-
Consumers work with domain-specific language while PDM uses generic SKOS internally
"Consumers of controlled vocabularies never know they're using SKOS. Domain models use terms that fit in with the domain"
Production System: Sphere
-
Sphere is self-service operational reporting system enabling business users to generate reports without technical intermediaries
"Sphere is a UDA-powered self-service operational reporting system"
-
Addresses data discovery and query generation through UDA domain models
"Data discovery and query generation are two relevant aspects of data integration"
-
Populates UDA domain models from Netflix's Enterprise GraphQL federated schema to preserve semantics
"we formulate a mechanism to use the syntax and semantics captured in the federated schema from Netflix's Enterprise GraphQL and populate representational domain models in UDA to preserve those details and add more"
-
Users search familiar business concepts (actors, movies) instead of specifying tables/join keys
"instead of specifying exact tables and join keys, users simply can search for familiar business concepts such as 'actors' or 'movies'"
-
Graph traversal establishes join strategies and ensures only feasible, joinable combinations selected
"Through graph traversal, we identify boundaries and islands within the data landscape. This ensures only feasible, joinable combinations are selected while weeding out semantically incorrect and non-executable query candidates"
Key Technologies & Standards
- RDF (Resource Description Framework): Foundation for knowledge graph structure
- SHACL (Shapes Constraint Language): Validation framework (adapted for enterprise use)
- SKOS (Simple Knowledge Organization System): W3C standard used in PDM
- GraphQL: Schema projection target, Enterprise GraphQL Gateway integration
- Avro: Schema projection for Data Mesh platform
- Iceberg: Table format for auto-populated projections
- SPARQL: Query language for knowledge graph introspection
- Jena: Java framework for RDF/semantic web, basis for generated APIs
- Data Mesh: General purpose data movement platform at Netflix scale
Future Directions
-
Support additional projections like Protobuf/gRPC
"Supporting additional projections like Protobuf/gRPC"
-
Materialize instance data knowledge graph for querying, profiling, management
"Materializing the knowledge graph of instance data for querying, profiling, and management"
-
Solve Graph Search challenges that inspired the work
"Finally solving some of the initial challenges posed by Graph Search (that actually inspired some of this work)"
