55 Matching Annotations
  1. Dec 2025
    1. I'm not advocating that everyone should self-host everything. But the pendulum has swung too far toward managed services. There's a large sweet spot where self-hosting makes perfect sense, and more teams should seriously consider it. Start small. If you're paying more than $200/month for RDS, spin up a test server and migrate a non-critical database. You might be surprised by how straightforward it is. The future of infrastructure is almost certainly more hybrid than it's been recently: managed services where they add genuine value, self-hosted where they're just expensive abstractions. Postgres often falls into the latter category. Footnotes They're either just hosting a vanilla postgres instance that's tied to the deployed hardware config, or doing something opaque with edge deploys and sharding. In the latter case they near guarantee your DB will stay highly available but costs can quickly spiral out of control. ↩ Maybe up to billions at this point. ↩ Even on otherwise absolutely snail speed hardware. ↩ This was Jeff Bezos's favorite phrase during the early AWS days, and it stuck. ↩ Similar options include OVH, Hetzner dedicated instances, or even bare metal from providers like Equinix. ↩ AWS RDS & S3 has had several major outages over the years. The most memorable was the 2017 US-East-1 outage that took down half the internet. ↩

      Cloud hosting can become an expensive abstraction layer quickly. I also think there's an entire generation of coders / engineers who treat silo'd cloudhosting as a given, without considering other options and their benefits. Large window for selfhosting in which postgres almost always falls

    2. Storage Tuning: NVMe SSDs make having content on disk less harmful than conventional spinning hard drives, so you'll want to pay attention to the disk type that you're hosted on:

      storage tuning is a selfhosting postgres concern too

    3. Making fresh connections in postgres has pretty expensive overhead, so you almost always want to put a load balancer on front of it. I'm using pgbouncer on all my projects by default - even when load might not call for it. Python asyncio applications just work better with a centralized connection pooler.

      Postgres parallel connections is something you want to stay on top of. load balancing needed

    4. Memory Configuration: This is where most people mess up. Pulling the standard postgres docker image won't cut it. You have to configure memory bounds with static limits that correspond to hardware. I've automated some of these configurations. But whether you do it manually or use some auto-config, tweaking these params is a must.

      Selfhosting Postgres requires to set static limits wrt memory.

  2. Dec 2024
    1. Unlike SQL designs that build on K/V stores, this allows to DSQL to do much of the heavy lifting of filtering and finding data right next to the data itself, on the storage replicas, without sacrificing scalability of storage or compute.

      This is differeny to cockroachdb then. the KV doesn't really know how to do the filtering, aggregation, projection stuff, because it takes care of reliability instead, but presumably there is a performance cost

  3. Jan 2024
  4. Oct 2023
  5. Sep 2023
  6. Jul 2023
  7. Jun 2023
  8. May 2023
  9. Jan 2023
  10. Dec 2022
  11. Nov 2022
  12. Oct 2022
  13. Jul 2022
  14. May 2021
  15. Mar 2021
  16. Feb 2021
  17. Jan 2021
  18. Oct 2020
  19. Dec 2019
  20. Jul 2019
  21. Jun 2019
  22. May 2019
  23. Apr 2019
    1. It is important to understand the interaction between aggregates and SQL's WHERE and HAVING clauses. The fundamental difference between WHERE and HAVING is this: WHERE selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas HAVING selects group rows after groups and aggregates are computed. Thus, the WHERE clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the HAVING clause always contains aggregate functions. (Strictly speaking, you are allowed to write a HAVING clause that doesn't use aggregates, but it's seldom useful. The same condition could be used more efficiently at the WHERE stage.)

      WHERE >> AGGREGATE >> HAVING (use aggregate functions)

      this is SQL procedural sequence on data querying.

      and yes HAVING kind of do almost the same thing in WHERE. but it accepts the grouping and aggregation functions that fails to be in WHERE.

  24. Jan 2018
  25. Dec 2017
  26. Oct 2017
    1. MySQL’s replication architecture means that if bugs do cause table corruption, the problem is unlikely to cause a catastrophic failure.

      I can't follow the reasoning here. I guess it's not guaranteed to replicate the corruption like Postgres would, but it seems totally possible to trigger similar or identical corruption because the implementation of the logical statement would be similar on the replica.

  27. Sep 2017
  28. Aug 2017
  29. Jul 2017
  30. Dec 2016
    1. The real benefit of JSONB: IndexesWe want our application to be fast. Without indexes, the database is forced to go from record to record (a table scan), checking to see if a condition is true. It’s no different with JSON data. In fact, it’s most likely worse since Postgres has to step in to each JSON document as well.

      This solves the problem of the last implementation I handled where json (not jsonb) data was stored in postgres

    1. When you’re picking a data store, the most important thing to understand is where in your data — and where in its connections — the business value lies. If you don’t know yet, which is perfectly reasonable, then choose something that won’t paint you into a corner. Pushing arbitrary JSON into your database sounds flexible, but true flexibility is easily adding the features your business needs.

      This is an old article but valuable thinking for system design.

    1. The BSON format used by MongoDB is limited to a maximum of 64 bits for representing an integer or floating point number, whereas the JSONB format used by Postgres does not have this limit. Postgres provides data constraint and validation functions to help ensure that JSON documents are more meaningful: for example, preventing attempts to store alphabetical characters where numerical values are expected. MongoDB offers automatic database sharding for easy horizontal scaling of JSON data storage. Scaling of Postgres installations has often been vertical. Horizontal scaling of Postgres is also possible, but tends to be more involved or use an additional third party solution. MongoDB also offers the possibility of increasing write throughput by deferring writing to disk. The tradeoff is potential loss of data, but this may suit users who have less need to persist their data.

      Good pros and cons of Mongo vs Postgres for JsonB

  31. Mar 2016
    1. The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated. (These statements do not apply when dropping the system oid column; that is done with an immediate rewrite.)