PostgreSQL

Looks like @[email protected] is looking for reviews on their latest optimizations to the Lemmy backend. Figured folks here might be interested in taking a look.

Zero-Downtime PostgreSQL Cutovers (tech.instacart.com)

submitted 1 year ago by jnovinger to c/postgresql

0 comments fedilink

When Did Postgres Become Cool? (www.crunchydata.com)

submitted 1 year ago by jnovinger to c/postgresql

2 comments fedilink

PostgreSQL Optimizations (lemmy.daqfx.com)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/postgresql

11 comments fedilink

cross-posted from: https://lemmy.daqfx.com/post/24701

I'm hosting my own Lemmy instance and trying to figure out how to optimize PSQL to reduce disk IO at the expense of memory.

I accept increased risk this introduces, but need to figure out parameters that will allow a server with a ton of RAM and reliable power to operate without constantly sitting with 20% iowait.

Current settings:
# DB Version: 15
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 32 GB
# CPUs num: 8
# Data Storage: hdd

max_connections = 200
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 10485kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_parallel_maintenance_workers = 4
fsync = off
synchronous_commit = off
wal_writer_delay = 800
wal_buffers = 64MB
Most load comes from LCS script seeding content and not actual users.

Solution: My issue turned out to be really banal - Lemmy's PostgreSQL container was pointing at default location for config file (/var/lib/postgresql/data/postgresql.conf) and not at the location where I actually mounted custom config file for the server (/etc/postgresql.conf). Everything is working as expected after I updated docker-compose.yaml file to point PostgreSQL to correct config file. Thanks @[email protected] for pointing me in the right direction!

Can I accomplish this in a single SQL statement? (lemmy.ml)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/postgresql

6 comments fedilink

SELECT id
    FROM my_table
    WHERE id IN (
     SELECT id
     FROM my_table
     WHERE criteria_a = 19
     ORDER BY create_when DESC
     LIMIT 1000
  );

This is the pattern I am looking for, but I need the criteria_a to be repeated for every value of criteria_a with the important focus being the LIMIT 1000 for any single value of criteria_a. There is no need to put a total LIMIT on the query, just to limit to the 1000 per criteria_a with the specific ORDER BY at that point. Put another way...

SELECT id
    FROM my_table
    WHERE id IN (
          SELECT id
		 FROM my_table
		 WHERE criteria_a = 19
		 ORDER BY create_when DESC
		 LIMIT 1000
	)
       OR id IN (
	  SELECT id
		 FROM my_table
		 WHERE criteria_a = 20
		 ORDER BY create_when DESC
		 LIMIT 1000
     );

Where I desire 2000 total rows. I could turn this into programming code (even a PostgreSQL FUNCTION) that loops over every value of criteria_a and replaces 19 in the example.

I don't care of it is a JOIN or an IN, I'm more stuck on how to repeat the inner SELECT with the LIMIT 1000 based on sort and criteria_a. Can I do it without looping and/or UNION? Thank you.

Why is my WAL directory so large? (www.depesz.com)

submitted 1 year ago by [email protected] to c/postgresql

0 comments fedilink

This problem happened recently to couple of people on various Pg support channels, so I figured I can write a bit more about it, so that in future I have a place where I can refer people to.

Lemmy server mass update of comment reply (child) count with PostgreSQL ltree structure (lemmy.ml)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/postgresql

11 comments fedilink

lemmy_server PostgreSQL table for comment does not keep parent comment id directly, it uses a path field of ltree type.

by default, every comment has a path of it's own primary key id.

comment id 101, path = "0.101"
comment id 102, path = "0.102"
comment id 103, path = "0.101.103"
comment id 104, path = "0.101.103.104"

comment 103 is a reply to comment 101, 104 is a reply to 103.

A second table named comment_aggregates has a count field with comment_id column linking to comment table id key. On each new comment reply, lemmy_server issues an update statement to update the counts on every parent in the tree. Rust code issues this to PostgreSQL:

        if let Some(parent_id) = parent_id {
          let top_parent = format!("0.{}", parent_id);
          let update_child_count_stmt = format!(
            "
update comment_aggregates ca set child_count = c.child_count
from (
  select c.id, c.path, count(c2.id) as child_count from comment c
  join comment c2 on c2.path &lt;@ c.path and c2.path != c.path
  and c.path &lt;@ '{top_parent}'
  group by c.id
) as c
where ca.comment_id = c.id"
          );

      sql_query(update_child_count_stmt).execute(conn).await?;
    }

I've been playing with doing bulk INSERT of thousands of comments at once to test SELECT query performance.

So far, this is the only SQL statement I have found that does a mass UPDATE of child_count from path for the entire comment table:

UPDATE
    comment_aggregates ca
SET
    child_count = c2.child_count
FROM (
    SELECT
        c.id,
        c.path,
        count(c2.id) AS child_count
    FROM
        comment c
    LEFT JOIN comment c2 ON c2.path &lt;@ c.path
        AND c2.path != c.path
GROUP BY
    c.id) AS c2
WHERE
    ca.comment_id = c2.id;

There are 1 to 2 millions comments stored on lemmy.ml and lemmy.world - ~~this rebuild of child_count can take hours, and may not complete at all. Even on 100,000 rows in a test system, it's a harsh UPDATE statement to execute.~~ EDIT: I found my API connection to production server was timing out and the run-time on the total rebuild isn't as bad as I thought. With my testing system I'm also finding it is taking under 19 seconds with 312684 comments. The query does seem to execute and run normal, not stuck.

Anyone have suggestions on how to improve this and help make Lemmy PostgreSQL servers more efficient?

EDIT: lemmy 0.18.3 and 0.18.4 are munging the less-than and greater-than signs in these code blocks.

[Podcast] Postgres FM | Sharding (postgres.fm)

submitted 1 year ago by jnovinger to c/postgresql

0 comments fedilink

Via: https://fosstodon.org/@postgresfm/110871203865830972

New episode: "Sharding"

Nikolay and Michael discuss sharding #Postgres — what it means, why and when it's needed, and the available options right now.

🎙️ https://postgres.fm/episodes/sharding

📺 https://youtu.be/72vCPZCHbHI

#postgresql

PostgreSQL 15.4, 14.9, 13.12, 12.16, 11.21, and PostgreSQL 16 Beta 3 Released (www.postgresql.org)

submitted 1 year ago by jnovinger to c/postgresql

0 comments fedilink

GitHub - zknill/sqledge: Replicate postgres to SQLite on the edge (github.com)

submitted 1 year ago by jnovinger to c/postgresql

1 comments fedilink

The Cache that Wasn’t: Optimizing Postgres for SELECT Performance (www.enterprisedb.com)

submitted 1 year ago by jnovinger to c/postgresql

1 comments fedilink

100

Path To Citus Con [Podcast] (pathtocituscon.transistor.fm)

submitted 1 year ago by jnovinger to c/postgresql

0 comments fedilink

From the site:

Path To Citus Con is for developers who love Postgres. Guests join co-hosts Claire Giordano and Pino de Candia to discuss the human side of open source, databases, PostgreSQL, and the many PG extensions (including Citus.)

Via: https://hachyderm.io/@clairegiordano/110851910835943655

Do you like podcasts? Are you a developer? Do you love #PostgreSQL?

I'm co-hosting a new #podcast for developers who love Postgres

This Wed Aug 9 @ 10:00am PDT you can join the live recording of #PathToCitusCon Ep06 w/guests Chelsea Dole & @floord

Topic = You're probably already using Postgres: What you need to know