PostgreSQL

643 readers
1 users here now

The world's most advanced open source relational database

Project
Events
Podcasts
Related Fediverse communities

founded 1 year ago
MODERATORS
76
77
78
79
80
81
25
PostgreSQL 16 (www.postgresql.org)
submitted 1 year ago by starman to c/postgresql
82
83
10
submitted 1 year ago* (last edited 1 year ago) by agilob to c/postgresql
84
85
86
 
 

A good introduction to memory management in PG. The material on pg_backend_memory_contexts eas totally new to me.

87
12
submitted 1 year ago by jnovinger to c/postgresql
88
89
 
 

cross-posted from: https://programming.dev/post/1894165

Looks like @[email protected] is looking for reviews on their latest optimizations to the Lemmy backend. Figured folks here might be interested in taking a look.

90
12
submitted 1 year ago by jnovinger to c/postgresql
91
20
When Did Postgres Become Cool? (www.crunchydata.com)
submitted 1 year ago by jnovinger to c/postgresql
92
6
PostgreSQL Optimizations (lemmy.daqfx.com)
submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/postgresql
 
 

cross-posted from: https://lemmy.daqfx.com/post/24701

I'm hosting my own Lemmy instance and trying to figure out how to optimize PSQL to reduce disk IO at the expense of memory.

I accept increased risk this introduces, but need to figure out parameters that will allow a server with a ton of RAM and reliable power to operate without constantly sitting with 20% iowait.

Current settings:

# DB Version: 15
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 32 GB
# CPUs num: 8
# Data Storage: hdd

max_connections = 200
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 10485kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_parallel_maintenance_workers = 4
fsync = off
synchronous_commit = off
wal_writer_delay = 800
wal_buffers = 64MB

Most load comes from LCS script seeding content and not actual users.

Solution: My issue turned out to be really banal - Lemmy's PostgreSQL container was pointing at default location for config file (/var/lib/postgresql/data/postgresql.conf) and not at the location where I actually mounted custom config file for the server (/etc/postgresql.conf). Everything is working as expected after I updated docker-compose.yaml file to point PostgreSQL to correct config file. Thanks @[email protected] for pointing me in the right direction!

93
 
 
SELECT id
    FROM my_table
    WHERE id IN (
     SELECT id
     FROM my_table
     WHERE criteria_a = 19
     ORDER BY create_when DESC
     LIMIT 1000
  );

This is the pattern I am looking for, but I need the criteria_a to be repeated for every value of criteria_a with the important focus being the LIMIT 1000 for any single value of criteria_a. There is no need to put a total LIMIT on the query, just to limit to the 1000 per criteria_a with the specific ORDER BY at that point. Put another way...

SELECT id
    FROM my_table
    WHERE id IN (
          SELECT id
		 FROM my_table
		 WHERE criteria_a = 19
		 ORDER BY create_when DESC
		 LIMIT 1000
	)
       OR id IN (
	  SELECT id
		 FROM my_table
		 WHERE criteria_a = 20
		 ORDER BY create_when DESC
		 LIMIT 1000
     );

Where I desire 2000 total rows. I could turn this into programming code (even a PostgreSQL FUNCTION) that loops over every value of criteria_a and replaces 19 in the example.

I don't care of it is a JOIN or an IN, I'm more stuck on how to repeat the inner SELECT with the LIMIT 1000 based on sort and criteria_a. Can I do it without looping and/or UNION? Thank you.

94
 
 

This problem happened recently to couple of people on various Pg support channels, so I figured I can write a bit more about it, so that in future I have a place where I can refer people to.

95
 
 

lemmy_server PostgreSQL table for comment does not keep parent comment id directly, it uses a path field of ltree type.

by default, every comment has a path of it's own primary key id.

comment id 101, path = "0.101"
comment id 102, path = "0.102"
comment id 103, path = "0.101.103"
comment id 104, path = "0.101.103.104"

comment 103 is a reply to comment 101, 104 is a reply to 103.

A second table named comment_aggregates has a count field with comment_id column linking to comment table id key. On each new comment reply, lemmy_server issues an update statement to update the counts on every parent in the tree. Rust code issues this to PostgreSQL:

        if let Some(parent_id) = parent_id {
          let top_parent = format!("0.{}", parent_id);
          let update_child_count_stmt = format!(
            "
update comment_aggregates ca set child_count = c.child_count
from (
  select c.id, c.path, count(c2.id) as child_count from comment c
  join comment c2 on c2.path <@ c.path and c2.path != c.path
  and c.path <@ '{top_parent}'
  group by c.id
) as c
where ca.comment_id = c.id"
          );
      sql_query(update_child_count_stmt).execute(conn).await?;
    }

I've been playing with doing bulk INSERT of thousands of comments at once to test SELECT query performance.

So far, this is the only SQL statement I have found that does a mass UPDATE of child_count from path for the entire comment table:

UPDATE
    comment_aggregates ca
SET
    child_count = c2.child_count
FROM (
    SELECT
        c.id,
        c.path,
        count(c2.id) AS child_count
    FROM
        comment c
    LEFT JOIN comment c2 ON c2.path <@ c.path
        AND c2.path != c.path
GROUP BY
    c.id) AS c2
WHERE
    ca.comment_id = c2.id;

There are 1 to 2 millions comments stored on lemmy.ml and lemmy.world - ~~this rebuild of child_count can take hours, and may not complete at all. Even on 100,000 rows in a test system, it's a harsh UPDATE statement to execute.~~ EDIT: I found my API connection to production server was timing out and the run-time on the total rebuild isn't as bad as I thought. With my testing system I'm also finding it is taking under 19 seconds with 312684 comments. The query does seem to execute and run normal, not stuck.

Anyone have suggestions on how to improve this and help make Lemmy PostgreSQL servers more efficient?

EDIT: lemmy 0.18.3 and 0.18.4 are munging the less-than and greater-than signs in these code blocks.

96
 
 

Via: https://fosstodon.org/@postgresfm/110871203865830972

New episode: "Sharding"

Nikolay and Michael discuss sharding #Postgres — what it means, why and when it's needed, and the available options right now.

🎙️ https://postgres.fm/episodes/sharding

📺 https://youtu.be/72vCPZCHbHI

#postgresql

97
98
99
100
2
Path To Citus Con [Podcast] (pathtocituscon.transistor.fm)
submitted 1 year ago by jnovinger to c/postgresql
 
 

From the site:

Path To Citus Con is for developers who love Postgres. Guests join co-hosts Claire Giordano and Pino de Candia to discuss the human side of open source, databases, PostgreSQL, and the many PG extensions (including Citus.)

Via: https://hachyderm.io/@clairegiordano/110851910835943655

Do you like podcasts? Are you a developer? Do you love #PostgreSQL?

I'm co-hosting a new #podcast for developers who love Postgres

This Wed Aug 9 @ 10:00am PDT you can join the live recording of #PathToCitusCon Ep06 w/guests Chelsea Dole & @floord

Topic = You're probably already using Postgres: What you need to know

view more: ‹ prev next ›