Hypothesis

112 Matching Annotations

Last 7 days
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

4
1. noahjaysimon 07 Feb 2026
  
  in Public
  
  utilizing a mix of third-party providers (OpenAI, Anthropic, etc.) and in-house trained LLMs
  
  LLMs are in-house trained. This is a HUGE MOAT
2. noahjaysimon 07 Feb 2026
  
  in Public
  
  Delta Lake for transactional data management, and ClickHouse for high-performance, real-time analytics and log storage.
  
  Delta lake is used in order to train the AI on transactional data
3. noahjaysimon 07 Feb 2026
  
  in Public
  
  The platform is built on AWS and uses Kubernetes and Ray for container orchestration and distributed computing.
  
  Basically uses Ray (I think Anyscale is the managed version) to manage the GPUs which are stored in AWS
4. noahjaysimon 07 Feb 2026
  
  in Public
  
  prioritizes real-time clickstream behavioral data over traditional vector-only or keyword-based search engines.
  
  Using real-time clickstream data is the core component that allows it to differentiate vs other systems
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/64818265c4ae
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

5
1. noahjaysimon 07 Feb 2026
  
  in Public
  
  Matillion provides a Low-Code/No-Code interface that allows data engineers to build these pipelines
  
  Think we do want data engineers for the actual data piece of AI enablement
2. noahjaysimon 07 Feb 2026
  
  in Public
  
  Matillion ensures the AI doesn't ingest irrelevant or sensitive information that could lead to "hallucinations" (confident but false AI responses).
  
  This is important, but feels more like an "add-on" than an actual core workflow
3. noahjaysimon 07 Feb 2026
  
  in Public
  
  triggers an API call (to OpenAI or Bedrock) to convert text into Vectors (numerical representations of meaning).
  
  Not super proprietary
4. noahjaysimon 07 Feb 2026
  
  in Public
  
  breaks long documents into smaller segments to fit the "context window" (the data limit an AI can process at once).
  
  Feel like you need actual access to historical AI data in order to determine what is most important - don't think they have the data moat to do this
5. noahjaysimon 07 Feb 2026
  
  in Public
  
  edacts Personally Identifiable Information (PII), and standardizes formats.
  
  This is somewhat standard transformations that a lot of data warehouses are beginning to just do themselves, and you have to compete with more enterprise governance tools like informatica, etc
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/17f6a97fb07e
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

4
1. noahjaysimon 06 Feb 2026
  
  in Public
  
  atillion’s model pushes processing to the data warehouse. Performing LLM (Large Language Model) inference or complex data cleaning inside a warehouse can be prohibitively expensive
  
  This is similar to dbt - Matillion just does the compute in-warehouse but that doesn't seem like something teams would want to do
2. noahjaysimon 06 Feb 2026
  
  in Public
  
  Data prep for AI is increasingly handled by Data Scientists using Python-based tools (like LangChain or LlamaIndex)
  
  Data prep for AI itself is already being won by key players
3. noahjaysimon 06 Feb 2026
  
  in Public
  
  If Matillion’s integrations are slower or less flexible than niche, AI-native competitors, they become a bottleneck rather than an enabler.
  
  Vectors require highly complex algorithms, etc that benefit from data flywheels. Matillion doesn't have this flywheel and it will be super hard to do it
4. noahjaysimon 06 Feb 2026
  
  in Public
  
  Matillion’s core strength is ELT (Extract, Load, Transform), which traditionally relies on batch processing;
  
  In today's AI day and age, you need up to date real time info rather than historical batch loading. Without having a streaming broker, you aren't really playing in the game
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/e83d9eef7a4c
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

6
1. noahjaysimon 06 Feb 2026
  
  in Public
  
  his counteracts the commoditization of pre-built connectors by allowing users to rapidly create their own high-quality integrations for niche or proprietary APIs without waiting for a vendor's roadmap.
  
  Matillion isn't built for AI Agents to actually use APIs - it is built for data engineers to do so. Fundamentally different things
2. noahjaysimon 06 Feb 2026
  
  in Public
  
  It differentiates Matillion from pure-SaaS competitors who require data to pass through their own servers.
  
  This is somewhat of a moat, but think all competitors will catch up farily quickly
3. noahjaysimon 06 Feb 2026
  
  in Public
  
  t provides "no-code" components for vector store integration (e.g., Pinecone), prompt engineering, and RAG (Retrieval-Augmented Generation) workflows.
  
  Matillion is now focused on chunking, OCR, embeddings etc for vector databases
4. noahjaysimon 06 Feb 2026
  
  in Public
  
  Maia uses natural language to build, test, and document entire end-to-end pipelines.
  
  Have essentially turned into an "AI data engineer application" which is much riskier
5. noahjaysimon 06 Feb 2026
  
  in Public
  
  fails to dominate the AI/Agentic data engineering space, it risks becoming just one of dozens of identical "IPaaS"
  
  Dominating this new AI / agentic space is key in terms of helping to "prepare" AI ready data
6. noahjaysimon 06 Feb 2026
  
  in Public
  
  dbt (Data Build Tool) has become the industry standard for SQL transformations, offering a code-based, version-controlled alternative to Matillion’s visual UI.
  
  DBT is already starting to dominate the AI data transformation market
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/79d1f96a2ae1
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

1
1. noahjaysimon 06 Feb 2026
  
  in Public
  
  By using an open standard (Iceberg, Delta Lake, Hudi), a company can switch their compute engine (e.g., moving from Databricks to Snowflake) without re-engineering their entire ETL pipeline or re-extracting data.
  
  This takes a huge piece away from the ETL market
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/7535f87031dd
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

4
1. noahjaysimon 06 Feb 2026
  
  in Public
  
  security, data lineage, and deployment pipelines around LangChain often exceeds the licensing cost of Dataiku, which provides these "out of the box."
  
  Dataiku's secret sauce is the LLM mesh connectors and that integration impact. Since everyone uses the same platform, within Dataiku you are able to have everyone use the same data security levels
2. noahjaysimon 05 Feb 2026
  
  in Public
  
  provides IT-centralized controls (audit logs, versioning, and access permissions) that are native to its platform
  
  These specific insights are what could make an enterprise-based team choose to use Dataiku because they need more specific tracking
3. noahjaysimon 05 Feb 2026
  
  in Public
  
  abstracts model APIs to provide cost tracking, PII (Personally Identifiable Information) filtering, and model-agnosticism
  
  The monitoring of SOTA model APIs is key. Think you can use open-source models with Dataiku but it is a little harder
4. noahjaysimon 05 Feb 2026
  
  in Public
  
  Dataiku manages the infrastructure and data layer (ETL, storage, model monitoring, and security).
  
  Dataiku is more focused on the RAG pipeline, choosing data, etc as well as connecting to the tools themselves
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/08d41a7004a9
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

1
1. noahjaysimon 05 Feb 2026
  
  in Public
  
  converting live, streaming data into vectors in real-time—and immediately performing a join against its current working memory.
  
  Often times data will just be streamed directly into Estuary, but you need real-time data to aggregate somewhat histrocial events into the agentic database.
  
  GPU will do real-time instant decisions - TigerGraph will look historically at the different lookups, etc to deliver data to the agent that it might need.
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/7d2c71be4c15
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

5
1. noahjaysimon 05 Feb 2026
  
  in Public
  
  it supports ~85% of Postgres features compared to CockroachDB's ~54%.
  
  Yuga supports postgres extentions which make the ecosystem extremely robust
2. noahjaysimon 05 Feb 2026
  
  in Public
  
  highly compatible with existing Postgres code, engineering teams can migrate with minimal rewriting of the application layer.
  
  This is probably a big decision when actually going to market
3. noahjaysimon 05 Feb 2026
  
  in Public
  
  offers more flexibility in how data is replicated (synchronously or asynchronously).
  
  Yuga is a little more useful - Cockroach just super stable
4. noahjaysimon 05 Feb 2026
  
  in Public
  
  though it makes range-based queries more resource-intensive.
  
  Yugabyte just scatters everything - makes it a little harder to query but fast writes
5. noahjaysimon 05 Feb 2026
  
  in Public
  
  CockroachDB focuses on ease of geo-distribution and strict data consistency, whereas YugabyteDB prioritizes deep PostgreSQL feature compatibility and high-performance throughput by reusing the original Postgres query engine.
  
  Cockroach is more about just being like insanely resilient - Yuga is about having more functionality while still being disributed
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/3d912facc1dd
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

3
1. noahjaysimon 03 Feb 2026
  
  in Public
  
  an update in Postgres and its subsequent replication to ClickHouse are not atomic, leading to potential "partial" data states during failures.
  
  This is they main piece
2. noahjaysimon 03 Feb 2026
  
  in Public
  
  often require manual intervention or complex mapping logic in Clickpipes to prevent the pipeline from breaking.
  
  It's not as simple as just using CDC - there are a ton of downstream issues like these schema issues that makes it harder
3. noahjaysimon 03 Feb 2026
  
  in Public
  
  atency between a write in Postgres and its visibility in ClickHouse), meaning analytical queries may reflect stale data compared to the live transactional state.
  
  This delay is a big reason why having operationg and OLAP data together is useful
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/6e041a519994
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

2
1. noahjaysimon 03 Feb 2026
  
  in Public
  
  sers can write simple scripts or "rules" to label data in bulk based on metadata or specific visual triggers
  
  Leads to a flywheel affect of organizations really "living" in these type of software
2. noahjaysimon 03 Feb 2026
  
  in Public
  
  small, specialized models trained on a client's specific subset of data to automate repetitive labeling tasks (e.g., identifying a specific type of medical lesion or a specific car part).
  
  This leads to a super specific data moat
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/8c0dcb4b7b42
Jan 2026
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

6
1. noahjaysimon 30 Jan 2026
  
  in Public
  
  copy-on-write forks
  
  This fork just points to original data for an AI agent to use in order to test features, etc
2. noahjaysimon 30 Jan 2026
  
  in Public
  
  olicy-driven automation for compression, retention, and re-ordering (
  
  This organizing for speed is again a super optimized-piece of the data
3. noahjaysimon 30 Jan 2026
  
  in Public
  
  compressed tuple filtering
  
  Standard compression which is helpful
4. noahjaysimon 30 Jan 2026
  
  in Public
  
  keeping dashboards current without re-scanning historical columnar data.
  
  This helps speed things up a ton
5. noahjaysimon 30 Jan 2026
  
  in Public
  
  vectorized query engine
  
  This is the actual motor that looks at multiple batches of data at once. This is what allows for way more faster queries vs using a compute engine that is slower
6. noahjaysimon 30 Jan 2026
  
  in Public
  
  automatic tiering of cold data to Amazon S3 (object storage) while maintaining full SQL access.
  
  This tiering of data enables way more efficient storage which helps teams run faster because you have less hot data to query from
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/c4e5f8d596c4
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

4
1. noahjaysimon 29 Jan 2026
  
  in Public
  
  TigerData’s pgai extension automates this within the database, allowing the database to "vectorize" its own data as it is inserted.
  
  This allows you to not have to send data, reducing ETL. This is massive.
2. noahjaysimon 29 Jan 2026
  
  in Public
  
  TigerData combines pgvectorscale (vector search) with pg_textsearch (BM25 algorithm).
  
  Combining these is HUGE. Allows you to have dual-search and reduce hallucinations + have one source of truth.
3. noahjaysimon 29 Jan 2026
  
  in Public
  
  SBQ improves upon standard binary quantization by using statistical methods to retain higher accuracy (99% recall) while reducing the data footprint by up to 30x.
  
  Another huge thing of having compression algorithms that minimize data store
4. noahjaysimon 29 Jan 2026
  
  in Public
  
  StreamingDiskANN allows the index to reside on SSDs (Disk) with minimal performance loss, significantly reducing hardware costs for large datasets.
  
  Able to help scale pgvector a ton compared to normal
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/2c4870e08cb6
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

2
1. noahjaysimon 29 Jan 2026
  
  in Public
  
  compaction (merging small files), clustering (optimizing data layout), and schema evolution.
  
  These are real managed services that OneHouse provides. Able to optimize the OTFs in the background
2. noahjaysimon 29 Jan 2026
  
  in Public
  
  automates the complex "last mile" of Change Data Capture (CDC
  
  Is moreso like a transformation tool for CDC
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/6f60a78805b1
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

4
1. noahjaysimon 26 Jan 2026
  
  in Public
  
  the Binary protocol (for speed) or HTTP (for compatibility/cloud).
  
  Doesn't mirror data - just converts data into text to say what it is. Is much faster than Clickhouse CDC
2. noahjaysimon 26 Jan 2026
  
  in Public
  
  It automatically converts Postgres SQL syntax and functions (like date_part or percentile_cont) into their equivalent ClickHouse function
  
  Is basically acting just like a translator for Clickhouse data
3. noahjaysimon 26 Jan 2026
  
  in Public
  
  This utilizes ClickHouse’s columnar speed rather than pulling raw data into Postgres for processing.
  
  Could allow for faster Postgres processing, although there are bottlenecks in the data federation and it isn't as good as
4. noahjaysimon 26 Jan 2026
  
  in Public
  
  PostgreSQL SQL/MED (SQL Management of External Data) standard
  
  Essentially Postgres is federating data from Clickhouse
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/ec2e6170511b
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

3
1. noahjaysimon 25 Jan 2026
  
  in Public
  
  materialized views (pre-computed summaries) that track changes in the underlying data
  
  Materialized views are how real-time data is able to occur so quickly
2. noahjaysimon 25 Jan 2026
  
  in Public
  
  Dual-Engine Storage (Hypercore):
  
  Hypercore is the multi-storage engine that allows for both OLAP and OLTP to be stored
3. noahjaysimon 25 Jan 2026
  
  in Public
  
  which automatically partition data into time-based "chunks."
  
  Hypertables are the chunking / partioning of the data
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/f4d0d06fc5a1
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

8
1. noahjaysimon 25 Jan 2026
  
  in Public
  
  is 24x faster than standard SQL for these operations.
  
  Another argument is just the speed of this infrastructure in general
2. noahjaysimon 25 Jan 2026
  
  in Public
  
  distributed locks in Redis
  
  This is just saying which worker is working on which trace at a time so that they don't overlap on the same trace.
3. noahjaysimon 25 Jan 2026
  
  in Public
  
  race data in cheap Object Storage (like S3) while keeping metadata in Postgres
  
  Metadata is what is used for organizational info, security, scores, context, user IDs, etc. The trace actually is the exact prompt sent to the AI and word-for-word response, the intermediate thoughts, tool calling, how long RAG retrieval took, etc
4. noahjaysimon 25 Jan 2026
  
  in Public
  
  By eliminating the "waiting for logs to load" phase
  
  Braintrust combines hot data in the WAL and cold storage relatively seamlessly
5. noahjaysimon 25 Jan 2026
  
  in Public
  
  It merges real-time data from a custom WAL with historical data from S3 in a single pass
  
  This is like super similar to what Estuary is doing, but i guess custom built for super huge AI traces
6. noahjaysimon 25 Jan 2026
  
  in Public
  
  a "failed" trace from production, convert it into a test case with one click, and run it against new code to ensure the bug doesn't reappear.
  
  This is that flywheel moat of integrating errors + convert the better answer to a golden dataset along with helping non-technical users interact with technical ones
7. noahjaysimon 25 Jan 2026
  
  in Public
  
  This allows teams to filter by specific variables, such as "show me every time the agent used the wrong API key during a refund process."
  
  From a use-case perspective, being able to see this level of detail is extremely helpful
8. noahjaysimon 25 Jan 2026
  
  in Public
  
  Brainstore captures every intermediate "thought," tool call, and sub-task within an agent's execution path.
  
  Captures the behind the scenes tooling - this integrations / APIs could be a moat as well
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/f63148103fe5
gemini.google.com gemini.google.com

‎Gemini - 轻松对话，获取创意灵感

1
1. noahjaysimon 25 Jan 2026
  
  in Public
  
  It merges real-time data from a custom WAL with historical data from S3 in a single pass
  
  This tool is extremely similar to what Estuary does
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

3
1. noahjaysimon 25 Jan 2026
  
  in Public
  
  a "failed" trace from production, convert it into a test case with one click, and run it against new code to ensure the bug doesn't reappear.
  
  This is that flywheel moat of integrating errors + convert the better answer to a golden dataset along with helping non-technical users interact with technical ones
2. noahjaysimon 25 Jan 2026
  
  in Public
  
  This allows teams to filter by specific variables, such as "show me every time the agent used the wrong API key during a refund process."
  
  From a use-case perspective, being able to see this level of detail is extremely helpful
3. noahjaysimon 25 Jan 2026
  
  in Public
  
  Brainstore captures every intermediate "thought," tool call, and sub-task within an agent's execution path.
  
  Captures the behind the scenes tooling - this integrations / APIs could be a moat as well
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/44d0124f8e7b
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

8
1. noahjaysimon 23 Jan 2026
  
  in Public
  
  Attention scales quadratically (O(n2)) with the number of tokens.
  
  In a typical architecture, the pixel goes through Self-Attention (pixels talking to pixels) and then immediately through Cross-Attention (pixels talking to text). It’s a "chain" where:
  
  Pixel-to-Pixel: Ensures Coherence (it looks like a real object).
  
  Pixel-to-Text: Ensures Adherence (it looks like what you asked for).
2. noahjaysimon 23 Jan 2026
  
  in Public
  
  Latent Diffusion (processing compressed images) to save on hardware costs.
  
  Latent diffusion is different than using somewhat stale pieces of images. It is about compressing something down so it can later be a full sized image.
3. noahjaysimon 23 Jan 2026
  
  in Public
  
  Pixel models generally cannot use this trick because the entire image state changes at every denoising step
  
  You can't just only look at historical KV caches because they won't be relevant anymore since the image has changed
4. noahjaysimon 22 Jan 2026
  
  in Public
  
  Rely heavily on Cross-Attention, which maps features from one modality (text prompt) to another (image latent space)
  
  The idea is that the pixels being generated by the model acts as the query. The text prompt stays the same, and the pixels ask the text which part of this should be red - then the model decides and it asks again.
  
  The tile is loaded directly into SRAM, along with the text prompt
5. noahjaysimon 22 Jan 2026
  
  in Public
  
  ttention must account for spatial proximity (up, down, left, right) rather than just "before" or "after."
  
  The attention is still all being output as vectors. The model looks at a prompt and calculates a score for how much a piece of a prompt measures for a certain tile, and then takes the actual values and multiplies them by those scores.
6. noahjaysimon 22 Jan 2026
  
  in Public
  
  maintain global structural coherence.
  
  In each step of asking the text prompt what to do with it's pixels, every tile asks: "What do the other tiles look like, and how do we all fit the text prompt?".
7. noahjaysimon 22 Jan 2026
  
  in Public
  
  2D grids of pixel patches
  
  These patches are what is broken down and sent to GPUs - it also needs bi-directional attention to align with both the text prompt and what other text patches are doing.
8. noahjaysimon 22 Jan 2026
  
  in Public
  
  simultaneously.
  
  Simultaneously here is key - you don't need to determine one word to do the text. Every attention mechanism is happening at the same time.
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/1b6ac67f5b3b
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

5
1. noahjaysimon 23 Jan 2026
  
  in Public
  
  s an open-source tool with a massive community, it provides a "common language" for data transformations
  
  This is the main thing - it's just open source and got a ton of adoption + network affects of people knowing how to use it, similar to Kafka
2. noahjaysimon 23 Jan 2026
  
  in Public
  
  rather than using external processing engines.
  
  Bascially spark / flink / estuary, etc are the actual processing engines. But dbt just uses the one that is being used in the dat warehouse, so it doesn't go through a double cycle
3. noahjaysimon 23 Jan 2026
  
  in Public
  
  combines SQL with Jinja (a Python-based templating engine) to enable logic
  
  Basically a just a language that transforms code to SQL making it easy for devs to use
4. noahjaysimon 23 Jan 2026
  
  in Public
  
  dbt ensures that parent tables are processed before child tables, preventing "missing data" errors.
  
  This is important part of the "transformation" of tables in terms of joining them together. The keys are what matter most to keep things sane throughout the data set
5. noahjaysimon 23 Jan 2026
  
  in Public
  
  build the physical objects in the warehouse.
  
  It appears to be all warehouse based - use that compute. Is probably the issue in using dbt to perform in-flight transformations
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/1ce216053ccd
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

5
1. noahjaysimon 22 Jan 2026
  
  in Public
  
  without hitting "memory walls."
  
  This is basically done by making it so with tiling you calculate the relationship for a small tile, write down that answer and then while doing this the other tile is being pulled from the HBM so that by the time the other tile is done you already have the attention you need.
  
  both the attention vectors and the weights needed are "tiled" in,
2. noahjaysimon 21 Jan 2026
  
  in Public
  
  uses proprietary networking software to manage how data moves between GPUs
  
  Together is still using NVLink - it somehow re-architects the execution and scheduling to be faster
3. noahjaysimon 21 Jan 2026
  
  in Public
  
  system runs a smaller, faster "draft" model alongside the main model to predict upcoming text
  
  This is the same thing Fireworks is doing
4. noahjaysimon 21 Jan 2026
  
  in Public
  
  fine-tuned for specific chips like the NVIDIA H100 to execute math operations with fewer wasted cycles.
  
  Together is getting to the low-level Kernel math similar to how Fireworks is
5. noahjaysimon 21 Jan 2026
  
  in Public
  
  implement specialized algorithms that reduce the memory read/write requirements of the Attention Mechanism
  
  While the core algorithm is open, Together AI maintains a proprietary layer called the Together Kernel Collection (TKC) for their own cloud customers
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/fe77d2bbb38f
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

3
1. noahjaysimon 22 Jan 2026
  
  in Public
  
  Stale Activation Reuse: It leverages temporal redundancy (the fact that data doesn't change drastically between steps) by using "stale" results from the previous step to start the current step immediately.
  
  This is huge - similar to how LLM models use models to predict the next word, PipeFusion uses historical parts of a tile to run another load even if the tile isn't fully finished. I imagine this would get better with more data?
2. noahjaysimon 22 Jan 2026
  
  in Public
  
  he system partitions an image into non-overlapping patches
  
  Basically the tiling of the images themselves isn;t the hard part. What is hard is dertmining hoiw to make a number of patches that is not equal to the amount of GPUs. The most important part is how these tiles talk to eachother
3. noahjaysimon 22 Jan 2026
  
  in Public
  
  Executive Summary: Fal PipeFusion Optimization
  
  Pipefusion's job is to decide how to chop an image into "patches" so they can be sent to different GPUs.
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/c4de2741aeaf
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

2
1. noahjaysimon 22 Jan 2026
  
  in Public
  
  This ensures that when a request arrives, the engine does not have to pull multi-gigabyte files over the public internet.
  
  This inference engine isn't the multi-modal specific thing
2. noahjaysimon 22 Jan 2026
  
  in Public
  
  compiled caches to load pre-compiled CUDA kernels directly into memory.
  
  Don't really understand this
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/d5d8f84c09b4
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

1
1. noahjaysimon 22 Jan 2026
  
  in Public
  
  to control exactly how data moves through the processor.
  
  By controlling data through the processor, you are both managing what KV cache is kept in the SRAM / which weights, as well as the actual threads on the GPU doing calculations and how much each thread should use.
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/297e90723dc9
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

5
1. noahjaysimon 21 Jan 2026
  
  in Public
  
  "incremental loading" (only moving new/changed data)
  
  CDC is often associated with real-time streaming, but batch processes use it to determine even in a batch loading process which pieces of data have changed and should be moved.
2. noahjaysimon 21 Jan 2026
  
  in Public
  
  Providers manage "cursors" or "bookmarks" to ensure idempotency (the ability to run a process multiple times with the same result).
  
  Basically just provide a guarnteed way to load in data
3. noahjaysimon 21 Jan 2026
  
  in Public
  
  Providers automate this "Schema Evolution" (modifying the target table to match the source).
  
  This schema evolution is probably pretty tough as well
4. noahjaysimon 21 Jan 2026
  
  in Public
  
  not because of file formats
  
  Loading data isn't hard because of different file formats - oh lol
5. noahjaysimon 21 Jan 2026
  
  in Public
  
  Specialized providers manage "throttling" (slowing down requests) and "retries" (re-attempting failed calls) so the connection doesn't break when limits are hit.
  
  Managing these connectors is key
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/f7cc897db426
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

3
1. noahjaysimon 21 Jan 2026
  
  in Public
  
  Streaming tools maintain an internal State Store
  
  This is HUGE. Allows to not re-read source data which is way faster
2. noahjaysimon 21 Jan 2026
  
  in Public
  
  it stays active in memory, processing individual Kafka messages the millisecond they arrive.
  
  For streaming, you have to have the architecture of maintaining the query at all times - not just writing the architecture once
3. noahjaysimon 21 Jan 2026
  
  in Public
  
  compiles SQL into a command, sends it to a data warehouse (e.g., Snowflake), and waits for the warehouse to process the entire dataset (or a defined partition) as a fixed block.
  
  The way DBT is built (warehouse centric) seems like it would take super longer compared to other stream processing frameworks
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/a83a45c723d5
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

4
1. noahjaysimon 21 Jan 2026
  
  in Public
  
  real-time processing turns thousands of GPS pings into a single "Estimated Time of Arrival" (ETA)
  
  Good real-time example of needing to probably do a ton of operations to the data being streamed to do this
2. noahjaysimon 21 Jan 2026
  
  in Public
  
  machine-readable formats like Protobuf or Avro into human-readable or database-friendly formats like JSON or SQL.
  
  These machine readable formats include metadata within the data itself to make it easy for a person or BI tool to understand it
3. noahjaysimon 21 Jan 2026
  
  in Public
  
  filtered to only emit a record if the temperature exceeds a specific threshold.
  
  can have certain filters - processing acts as what actually sees the individual data
4. noahjaysimon 21 Jan 2026
  
  in Public
  
  Transformation involves "joining" a fast-moving stream (e.g., a credit card swipe) with a static database (e.g., the user’s credit limit) to create a complete record.
  
  This is key, and similar to doing joining in a normal ELT process into a database
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/76537ffdf926
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

6
1. noahjaysimon 21 Jan 2026
  
  in Public
  
  Modern Machine Learning models require highly structured, "clean" data to function;
  
  This is key
2. noahjaysimon 21 Jan 2026
  
  in Public
  
  Unifies data silos so that Marketing, Finance, and Sale
  
  This unification is super important in the age of AI
3. noahjaysimon 21 Jan 2026
  
  in Public
  
  connecting a CRM (Customer Relationship Management) record to a Web Analytics ID to see a full customer journey.
  
  This linking / joining is probably the most complex part of data transformation
4. noahjaysimon 21 Jan 2026
  
  in Public
  
  converting thousands of individual daily transactions into a single "Monthly Total Revenue" figure.
  
  Perform function son information like Excel to perform aggregate value calculations
5. noahjaysimon 21 Jan 2026
  
  in Public
  
  converting all dates to YYYY-MM-DD or all currencies to USD).
  
  Makes sense in terms of tools that align structure of data - kind of like excel but just at insane scale
6. noahjaysimon 21 Jan 2026
  
  in Public
  
  removing duplicate customer records or filling in missing values (e.g., converting "NULL" to "0").
  
  This makes sense
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/a8ff1be3025b
gemini.google.com gemini.google.com

‎Gemini - direct access to Google AI

4
1. noahjaysimon 21 Jan 2026
  
  in Public
  
  It manages KV cache reuse (storing previous conversation context to speed up new responses)
  
  Baseten isn't managing the specific things in the KV cache - just the persistence and routing of it, so it stays warm. It routes user requests to a GPU that already has specific caches. It's KV cache orchestration, not determining exactly what sits on one within one GPU.
2. noahjaysimon 21 Jan 2026
  
  in Public
  
  parallelized byte-range downloads and image streaming
  
  This is a different technical approach to loading weights vs Modal - it just distributes out the entire model weights to parallelize the compute to break it down and then load those small pieces
3. noahjaysimon 21 Jan 2026
  
  in Public
  
  number of active GPU replicas (duplicate model instances)
  
  Essentially just distributes out these GPUs when traffic requires using an Autoscaler - seems similar to Modal.
  
  Also allows for it to be fail-proof since have multiple GPUs.
4. noahjaysimon 21 Jan 2026
  
  in Public
  
  using kernel fusion
  
  Kernel fusion itself isn't proprietary - Baseten manages open-source kernels
Visit annotations in context

Annotators

noahjaysimon

URL

gemini.google.com/share/2621da5874c2

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL