145 Matching Annotations
  1. Jun 2024
  2. Apr 2024
    1. The standard difference operator (-) returns day-based results for both date, timestamp and timestamp with time zone (the former returns days as int, the latter two return day-based intervals): From the day-based intervals you can extract days with the extract() function: select current_date - '2017-01-01', extract(day from now()::timestamp - '2017-01-01 00:00:00'), extract(day from now() - '2017-01-01 00:00:00Z');
    1. select count(*) filter (where year <= 1945) as pre1945, count(*) filter (where year between 1946 and 1964) as period2, count(*) filter (where year between 1965 and 1974) as period3, ... from ... where ...;
  3. Oct 2023
  4. Sep 2023
  5. Jul 2023
    1. Writing to the database may fail (e.g. it will not respond). When that happens, the process handling outbox pattern will try to resend the event after some time and try to do it until the message is correctly marked as sent in the database.

      Outbox pattern should be especially implemented when using operations such as PostgreSQL LISTEN/NOTIFY

    1. But there is no absolute reason to use JOIN LATERAL (...) ON TRUE because you could just write it as a CROSS JOIN LATERAL instead
    2. If you want to return a NULL-extended row in the case where the lateral join returns no rows, then you would use LEFT JOIN LATERAL (...) ON TRUE. Any condition other than TRUE is not necessary, as you could have just wrote it into the subquery itself
  6. Jun 2023
    1. typical use would be to reference a json or jsonb column laterally from another table in the query's FROM clause.
  7. May 2023
  8. Feb 2023
    1. Most applications do not need to process massive amounts of data. This has led to a resurgence in data management systems with traditional architectures; SQLite, Postgres, MySQL are all growing strongly, while “NoSQL” and even “NewSQL” systems are stagnating.

      SQL still shines over NoSQL

  9. Jan 2023
    1. The array subscript numbers are written within square brackets. By default PostgreSQL uses a one-based numbering convention for arrays, that is, an array of n elements starts with array[1] and ends with array[n].
  10. Dec 2022
    1. interval values can be written using the following verbose syntax: [@] quantity unit [quantity unit...] [direction] where quantity is a number (possibly signed); unit is microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century, millennium, or abbreviations or plurals of these units
  11. Nov 2022
    1. Post.in_order_of(:type, %w[Draft Published Archived]).order(:created_at).pluck(:name) which generates SELECT posts.name FROM posts ORDER BY CASE posts.type WHEN 'Draft' THEN 1 WHEN 'Published' THEN 2 WHEN 'Archived' THEN 3 ELSE 4 END ASC, posts.created_at ASC
    1. scale: 3 means 3 of those digits come after the decimal point (surely it should be the other way round, scale for how big the number is and precision for its decimal places?):
  12. Oct 2022
    1. Python is known for using more memory than more optimized languages and, in this case, it uses 7 times more than PostgresML.
    2. PostgresML outperforms traditional Python microservices by a factor of 8 in local tests and by a factor of 40 on AWS EC2.
    1. You can unknowingly be sending your critical database traffic in the clear because your client uses a default of allow or disable while the server you’re connecting to does, in fact, support SSL.
    2. You can unknowingly be sending your critical database traffic in the clear because your client uses a default of prefer, allow or disable and the server you’re connecting to does not support SSL.
    3. What Should I Do?

      Advices to set verify-full encryption for: - developers - PostgreSQL server maintainers - users - PostgreSQL tool makers - PostgreSQL creators

    4. Many popular SQL clients do not use SSL by default. If you aren’t deliberate about choosing encryption, the connection will be unencrypted.

      Table with SQL clients and their default SSL mode:

    5. SSL is disabled by default in jdbc, npgsql, node-postgres, and pgx.

      Table with programming libraires and their default SSL mode:

    6. There are a lot of PostgreSQL servers connected to the Internet: we searched shodan.io and obtained a sample of more than 820,000 PostgreSQL servers connected to the Internet between September 1 and September 29. Only 36% of the servers examined had SSL certificates. More than 523,000 PostgreSQL servers listening on the Internet did not use SSL (64%)
    7. At most 15% of the approximately 820,000 PostgreSQL servers listening on the Internet require encryption. In fact, only 36% even support encryption. This puts PostgreSQL servers well behind the rest of the Internet in terms of security. In comparison, according to Google, over 96% of page loads in Chrome on a Mac are encrypted. The top 100 websites support encryption, and 97 of those default to encryption.
  13. Sep 2022
    1. Otherwise behaves according to the value of null_value_treatment which must be one of 'raise_exception', 'use_json_null', 'delete_key', or 'return_target'. The default is 'use_json_null'.
  14. Jul 2022
    1. How does PostgreSQL access a column? It will fetch the row and then dissect this tuple to calculate the position of the desired column inside the row. So if we want to access column #1000 it means that we have to figure out how long those first 999 columns before our chosen one really are. This can be quite complex. For integer we simply have to add 4, but in case of varchar, the operation turns into something really expensive.

      Column order matters in PostgreSQL. The time to read 1000th column can be pretty large, especially if there are varchar columns before that.

  15. May 2022
  16. Apr 2022
  17. Mar 2022
    1. When we insert data into the comments table, we need it to route to be inserted into a partition instead of the actual table. For that, we can create a trigger: add_partition_trigger :comments, :comments_by_year, [ { if: 'NEW.year = 2016', insert: :comments_2016 }, { elsif: 'NEW.year = 2017', insert: :comments_2017 }, { else: "RAISE EXECEPTION 'comments_by_year recieived an unexpected value: %', NEW.year;" } ]
    1. ARRAY( SELECT DISTINCT ... FROM unnest(arr) )
    2. And that can all can be written with CROSS JOIN LATERAL which is much cleaner, SELECT ARRAY( SELECT DISTINCT e FROM ( VALUES ('foo', 'bar', 'foo', 'baz' ) ) AS t(a,b,c,d) CROSS JOIN LATERAL unnest(ARRAY[a,b,c,d]) AS a(e) -- ORDER BY e; -- if you want it sorted );
  18. Feb 2022
    1. You can use to_char to convert the timestamp using appropriate format mask: COALESCE(to_char(timestamp_type, 'YYYY-MM-DD'), varchar_col)
  19. Dec 2021
  20. Jul 2021
    1. The most important part of this query is the with block. It's a powerful resource, but for this example, you can think of it as a "way to store a variable" that is the path of the contact you need to update, which will be dynamic depending on the record.
    2. It just builds the path as '{1, value}', but we need to convert to text[] because that’s the type expected on the jsonb_path function.
    1. For user-contributed data that's freeform and unstructured, use jsonb. It should perform as well as hstore, but it's more flexible and easier to work with.
  21. Jun 2021
  22. www.postgresql.org www.postgresql.org
    1. A LATERAL item can appear at top level in the FROM list, or within a JOIN tree. In the latter case it can also refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.
    1. SELECT base.nr, multiples.multiple FROM (SELECT generate_series(1,10) AS nr) base, LATERAL ( SELECT multiples.multiple FROM ( SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple ) multiples WHERE multiples.b_nr = base.nr ) multiples;
    1. SELECT s.id, s1.percent_water , s1.percent_water * 100 AS percent_water_100 FROM samples s , LATERAL (SELECT s.wet_weight / NULLIF(s.dry_weight - 1, 0) AS percent_water) s1;
    1. You can use jsonb_agg and jsonb_array_elements_text to flatten or unnest an array of arrays in PostgreSQL: SELECT jsonb_array_elements_text(jsonb_agg(array_of_arrays)) FROM x;
    1. And this has some immediate benefits: more efficiency, significantly faster to process, supports indexing (which can be a significant advantage, as we'll see later), simpler schema designs (replacing entity-attribute-value (EAV) tables with jsonb columns, which can be queried, indexed and joined, allowing for performance improvements up until 1000X!)
    2. Besides efficiency, there are extra ways in which you can benefit from storing JSON in binary form. One such enhancement is the GIN (Generalized Inverted Index) indexes and a new brand of operators that come with them.
    3. SELECT jsonb_array_elements_text(data->'genres') AS genre FROM books WHERE book_id = 1; That will expand the JSON array into a column:
    4. This is an important one, as it will enable us to use the aggregate functions that we are familiar when dealing with relational databases, but in the otherwise counter-intuitive environment of JSON data.
    1. As stated in the title, I am in a situation where I need to return a count of occurrences within an array, that is within a jsonb column.
    1. select t.* from my_table t, jsonb_each(my_col) as value1(key1, value1), jsonb_each(value1) as value2(key2, value2) where jsonb_typeof(my_col) = 'object' and jsonb_typeof(value1) = 'object' and value2->>'Param3' = '6';
    2. This means that you have a json structure different from that described in your question. You cannot use generic jsonb functions when the json structures differ in each row.
    1. json_array_elements_text ( json ) → setof text jsonb_array_elements_text ( jsonb ) → setof text Expands the top-level JSON array into a set of text values. select * from json_array_elements_text('["foo", "bar"]') → value ----------- foo bar
    1. The age_in_years is calculated for every record of the blog table. So, it works like a correlated subquery, but the subquery records are joined with the primary table and, for this reason, we can reference the columns produced by the subquery.
    1. You can use the pg_typeof() function, which also works well for arbitrary values. SELECT pg_typeof("stu_id"), pg_typeof(100)

      pg_typeof

    1. SELECT * FROM departments AS d, LATERAL (SELECT * FROM employees AS e WHERE e.department_ID = d.department_ID)
    2. In a FROM clause, the LATERAL keyword allows an inline view to reference columns from a table expression that precedes that inline view.
    1. Subqueries appearing in FROM can be preceded by the key word LATERAL. This allows them to reference columns provided by preceding FROM items. (Without LATERAL, each subquery is evaluated independently and so cannot cross-reference any other FROM item.) TL;DR - LATERAL allows subqueries to reference earlier tables.
    2. SELECT * FROM ( -- build virtual table of all hours between -- a date range SELECT start_ts, start_ts + interval '1 hour' AS end_ts FROM generate_series( '2017-03-01'::date, '2017-03-03'::timestamp - interval '1 hour', interval '1 hour' ) AS t(start_ts) ) AS cal LEFT JOIN ( -- build virtual table of uptimes SELECT * FROM ( VALUES ('2017-03-01 01:15:00-06'::timestamp, '2017-03-01 02:15:00-06'::timestamp), ('2017-03-01 08:00:00-06', '2017-03-01 20:00:00-06'), ('2017-03-02 19:00:00-06', null) ) AS t(start_ts, end_ts) ) AS uptime ON cal.end_ts > uptime.start_ts AND cal.start_ts <= coalesce(uptime.end_ts, current_timestamp)
    3. FROM generate_series('2017-03-01'::date, '2017-03-03'::timestamp - interval '1 hour', interval '1 hour' ) AS t(start_ts)
    1. In this case it's probably better to have the function return a tabletype, RETURNS table, or define the output with OUT variables. Therecord defining syntax at time of query: SELECT foo() AS (a int, b text);
    2. dlbink is a good example, since thestructure of the input query directly controls the stucture of thereturn type. The sever has no way to deal with that when the query isparsed and planned, so you have to help it out.
    1. >> We have that already, it's named 'json_each_text' > Apparently you haven't looked at json parse/deparse costs ;P Well, a PL function is gonna be none too cheap either. Using something like JSON definitely has lots to recommend it --- eg, it probably won't break when you find out your initial spec for the transport format was too simplistic.
    1. scope :with_spec_options, ->(spec_options) { where("specs->'spec_option' @> ?", spec_options.to_json) }
    2. where("specs->'spec_option' ?| array[:options]", options: spec_options)
    1. -- The array on the right side is not considered contained within the -- array on the left, even though a similar array is nested within it: SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb; -- yields false -- But with a layer of nesting, it is contained: SELECT '[1, 2, [1, 3]]'::jsonb @> '[[1, 3]]'::jsonb;
  23. May 2021
  24. Jan 2021
  25. Nov 2020
  26. Jul 2020
  27. Jun 2020
  28. May 2020
    1. In PostrgreSQL 8.4 and up you can use: select array_agg(x) from (select unnest(ARRAY[1,5,3,7,2]) AS x order by x) as _; But it will not be very fast.
    2. The best way to sort an array of integers is without a doubt to use the intarray extension, which will do it much, much, much faster than any SQL formulation: CREATE EXTENSION intarray; SELECT sort( ARRAY[4,3,2,1] );
    1. It is also possible to construct an array from the results of a subquery. In this form, the array constructor is written with the key word ARRAY followed by a parenthesized (not bracketed) subquery.
    2. An array constructor is an expression that builds an array value using values for its member elements. A simple array constructor consists of the key word ARRAY, a left square bracket [, a list of expressions (separated by commas) for the array element values, and finally a right square bracket ].
    1. Sure, with json_object_keys(). This returns a set - unlike the JavaScript function Object.keys(obj) you are referring to, which returns an array. Feed the set to an ARRAY constructor to transform it: SELECT id, ARRAY(SELECT json_object_keys(obj)) AS keys FROM tbl_items;
    2. A more verbose form would be to spell out a LATERAL join instead of the correlated subquery: SELECT t.id, k.keys FROM tbl_items t LEFT JOIN LATERAL (SELECT ARRAY(SELECT * FROM json_object_keys(t.obj)) AS keys) k ON true;
    1. If you have variable number of ids (or you have a lot of them), you can use json[b]_array_elements() to extract each element of the array, build up an id list and then query it with the any-containment operator ?|: select * from jsonbtest where to_json(array(select jsonb_array_elements(data) ->> 'id'))::jsonb ?| array['1884595530', '791712670'];
    1. Which database technology to choose

      Which database to choose (advice from an Amazon employee):

      • SQL - ad hoc queries and/or support of ACID and transactions
      • NoSQL - otherwise. NoSQL is getting better with transactions and PostgreSQL is getting better with availability, scalability, durability
  29. Apr 2020
    1. At this point it’s probably worth replacing hstore use with jsonb in all new applications.
    2. the request to find people with matching phone numbers could be turned into a query like
    1. If you're choosing between json/jsonb/hstore, generally choose jsonb unless you have a reason not to
    2. Relational databases are designed around joins, and optimized to do them well. Unless you have a good reason not to use a normalized design, use a normalised design. jsonb and things like hstore are good for when you can't use a normalized data model, such as when the data model changes rapidly and is user defined. If you can model it relationally, model it relationally. If you can't, consider json etc.
    3. Joins are not expensive. Who said it to you? As basically the whole concept of relational databases revolve around joins (from a practical point of view), these product are very good at joining. The normal way of thinking is starting with properly normalized structures and going into fancy denormalizations and similar stuff when the performance really needs it on the reading side. JSON(B) and hstore (and EAV) are good for data with unknown structure.
  30. Jan 2020
    1. Which to use? ANY is a later, more versatile addition, it can be combined with any binary operator returning a boolean value. IN burns down to a special case of ANY. In fact, its second form is rewritten internally: IN is rewritten with = ANY NOT IN is rewritten with <> ALL
  31. Dec 2019
    1. Practical highlights in my opinion:

      • It's important to know about data padding in PG.
      • Be conscious when modelling data tables about columns ordering, but don't be pure-school and do it in a best-effort basis.
      • Gains up to 25% in wasted storage are impressive but always keep in mind the scope of the system. For me, gains are not worth it in the short-term. Whenever a system grows, it is possible to migrate data to more storage-efficient tables but mind the operative burder.

      Here follows my own commands on trying the article points. I added - pg_column_size(row()) on each projection to have clear absolute sizes.

      -- How does row function work?
      
      SELECT pg_column_size(row()) AS empty,
             pg_column_size(row(0::SMALLINT)) AS byte2,
             pg_column_size(row(0::BIGINT)) AS byte8,
             pg_column_size(row(0::SMALLINT, 0::BIGINT)) AS byte16,
             pg_column_size(row(''::TEXT)) AS text0,
             pg_column_size(row('hola'::TEXT)) AS text4,
             0 AS term
      ;
      
      -- My own take on that
      
      SELECT pg_column_size(row()) AS empty,
             pg_column_size(row(uuid_generate_v4())) AS uuid_type,
             pg_column_size(row('hola mundo'::TEXT)) AS text_type,
             pg_column_size(row(uuid_generate_v4(), 'hola mundo'::TEXT)) AS uuid_text_type,
             pg_column_size(row('hola mundo'::TEXT, uuid_generate_v4())) AS text_uuid_type,
             0 AS term
      ;
      
      CREATE TABLE user_order (
        is_shipped    BOOLEAN NOT NULL DEFAULT false,
        user_id       BIGINT NOT NULL,
        order_total   NUMERIC NOT NULL,
        order_dt      TIMESTAMPTZ NOT NULL,
        order_type    SMALLINT NOT NULL,
        ship_dt       TIMESTAMPTZ,
        item_ct       INT NOT NULL,
        ship_cost     NUMERIC,
        receive_dt    TIMESTAMPTZ,
        tracking_cd   TEXT,
        id            BIGSERIAL PRIMARY KEY NOT NULL
      );
      
      SELECT a.attname, t.typname, t.typalign, t.typlen
        FROM pg_class c
        JOIN pg_attribute a ON (a.attrelid = c.oid)
        JOIN pg_type t ON (t.oid = a.atttypid)
       WHERE c.relname = 'user_order'
         AND a.attnum >= 0
       ORDER BY a.attnum;
      
      -- What is it about pg_class, pg_attribute and pg_type tables? For future investigation.
      
      -- SELECT sum(t.typlen)
      -- SELECT t.typlen
      SELECT a.attname, t.typname, t.typalign, t.typlen
        FROM pg_class c
        JOIN pg_attribute a ON (a.attrelid = c.oid)
        JOIN pg_type t ON (t.oid = a.atttypid)
       WHERE c.relname = 'user_order'
         AND a.attnum >= 0
       ORDER BY a.attnum
      ;
      
      -- Whoa! I need to master mocking data directly into db.
      
      INSERT INTO user_order (
          is_shipped, user_id, order_total, order_dt, order_type,
          ship_dt, item_ct, ship_cost, receive_dt, tracking_cd
      )
      SELECT true, 1000, 500.00, now() - INTERVAL '7 days',
             3, now() - INTERVAL '5 days', 10, 4.99,
             now() - INTERVAL '3 days', 'X5901324123479RROIENSTBKCV4'
        FROM generate_series(1, 1000000);
      
      -- New item to learn, pg_relation_size. 
      
      SELECT pg_relation_size('user_order') AS size_bytes,
             pg_size_pretty(pg_relation_size('user_order')) AS size_pretty;
      
      SELECT * FROM user_order LIMIT 1;
      
      SELECT pg_column_size(row(0::NUMERIC)) - pg_column_size(row()) AS zero_num,
             pg_column_size(row(1::NUMERIC)) - pg_column_size(row()) AS one_num,
             pg_column_size(row(9.9::NUMERIC)) - pg_column_size(row()) AS nine_point_nine_num,
             pg_column_size(row(1::INT2)) - pg_column_size(row()) AS int2,
             pg_column_size(row(1::INT4)) - pg_column_size(row()) AS int4,
             pg_column_size(row(1::INT2, 1::NUMERIC)) - pg_column_size(row()) AS int2_one_num,
             pg_column_size(row(1::INT4, 1::NUMERIC)) - pg_column_size(row()) AS int4_one_num,
             pg_column_size(row(1::NUMERIC, 1::INT4)) - pg_column_size(row()) AS one_num_int4,
             0 AS term
      ;
      
      SELECT pg_column_size(row(''::TEXT)) - pg_column_size(row()) AS empty_text,
             pg_column_size(row('a'::TEXT)) - pg_column_size(row()) AS len1_text,
             pg_column_size(row('abcd'::TEXT)) - pg_column_size(row()) AS len4_text,
             pg_column_size(row('abcde'::TEXT)) - pg_column_size(row()) AS len5_text,
             pg_column_size(row('abcdefgh'::TEXT)) - pg_column_size(row()) AS len8_text,
             pg_column_size(row('abcdefghi'::TEXT)) - pg_column_size(row()) AS len9_text,
             0 AS term
      ;
      
      SELECT pg_column_size(row(''::TEXT, 1::INT4)) - pg_column_size(row()) AS empty_text_int4,
             pg_column_size(row('a'::TEXT, 1::INT4)) - pg_column_size(row()) AS len1_text_int4,
             pg_column_size(row('abcd'::TEXT, 1::INT4)) - pg_column_size(row()) AS len4_text_int4,
             pg_column_size(row('abcde'::TEXT, 1::INT4)) - pg_column_size(row()) AS len5_text_int4,
             pg_column_size(row('abcdefgh'::TEXT, 1::INT4)) - pg_column_size(row()) AS len8_text_int4,
             pg_column_size(row('abcdefghi'::TEXT, 1::INT4)) - pg_column_size(row()) AS len9_text_int4,
             0 AS term
      ;
      
      SELECT pg_column_size(row(1::INT4, ''::TEXT)) - pg_column_size(row()) AS int4_empty_text,
             pg_column_size(row(1::INT4, 'a'::TEXT)) - pg_column_size(row()) AS int4_len1_text,
             pg_column_size(row(1::INT4, 'abcd'::TEXT)) - pg_column_size(row()) AS int4_len4_text,
             pg_column_size(row(1::INT4, 'abcde'::TEXT)) - pg_column_size(row()) AS int4_len5_text,
             pg_column_size(row(1::INT4, 'abcdefgh'::TEXT)) - pg_column_size(row()) AS int4_len8_text,
             pg_column_size(row(1::INT4, 'abcdefghi'::TEXT)) - pg_column_size(row()) AS int4_len9_text,
             0 AS term
      ;
      
      SELECT pg_column_size(row()) - pg_column_size(row()) AS empty_row,
             pg_column_size(row(''::TEXT)) - pg_column_size(row()) AS no_text,
             pg_column_size(row('a'::TEXT)) - pg_column_size(row()) AS min_text,
             pg_column_size(row(1::INT4, 'a'::TEXT)) - pg_column_size(row()) AS two_col,
             pg_column_size(row('a'::TEXT, 1::INT4)) - pg_column_size(row()) AS round4;
      
      SELECT pg_column_size(row()) - pg_column_size(row()) AS empty_row,
             pg_column_size(row(1::SMALLINT)) - pg_column_size(row()) AS int2,
             pg_column_size(row(1::INT)) - pg_column_size(row()) AS int4,
             pg_column_size(row(1::BIGINT)) - pg_column_size(row()) AS int8,
             pg_column_size(row(1::SMALLINT, 1::BIGINT)) - pg_column_size(row()) AS padded,
             pg_column_size(row(1::INT, 1::INT, 1::BIGINT)) - pg_column_size(row()) AS not_padded;
      
      SELECT a.attname, t.typname, t.typalign, t.typlen
        FROM pg_class c
        JOIN pg_attribute a ON (a.attrelid = c.oid)
        JOIN pg_type t ON (t.oid = a.atttypid)
       WHERE c.relname = 'user_order'
         AND a.attnum >= 0
       ORDER BY t.typlen DESC;
      
      DROP TABLE user_order;
      
      CREATE TABLE user_order (
        id            BIGSERIAL PRIMARY KEY NOT NULL,
        user_id       BIGINT NOT NULL,
        order_dt      TIMESTAMPTZ NOT NULL,
        ship_dt       TIMESTAMPTZ,
        receive_dt    TIMESTAMPTZ,
        item_ct       INT NOT NULL,
        order_type    SMALLINT NOT NULL,
        is_shipped    BOOLEAN NOT NULL DEFAULT false,
        order_total   NUMERIC NOT NULL,
        ship_cost     NUMERIC,
        tracking_cd   TEXT
      );
      
      -- And, what about other varying size types as JSONB?
      
      SELECT pg_column_size(row('{}'::JSONB)) - pg_column_size(row()) AS empty_jsonb,
             pg_column_size(row('{}'::JSONB, 0::INT4)) - pg_column_size(row()) AS empty_jsonb_int4,
             pg_column_size(row(0::INT4, '{}'::JSONB)) - pg_column_size(row()) AS int4_empty_jsonb,
             pg_column_size(row('{"a": 1}'::JSONB)) - pg_column_size(row()) AS basic_jsonb,
             pg_column_size(row('{"a": 1}'::JSONB, 0::INT4)) - pg_column_size(row()) AS basic_jsonb_int4,
             pg_column_size(row(0::INT4, '{"a": 1}'::JSONB)) - pg_column_size(row()) AS int4_basic_jsonb,
             0 AS term;
      
  32. Nov 2019