this post was submitted on 18 Aug 2023
6 points (100.0% liked)
PostgreSQL
667 readers
1 users here now
The world's most advanced open source relational database
Project
- About (history)
- Docs
- Donate to PostgreSQL
- Wiki
- Planet PostgreSQL
- IRC
- Mailing lists:
- pgsql-announce
- pgsql-hackers (developers)
- pgsql-general
- pgsql-jobs
- User Groups
Events
- SEAPUG Summer BBQ, 6 July in Seattle
- SFBA PostgreSQL Meetup, 12 July
- Chicago PostgreSQL Meetup, 19 July
- PGDay UK 2023, 12 September in London
- PGConf 2023, 3-5 October in New York City
- PGDay Israel 2023, 19 October
- PGConf.EU 2023, 12-15 December in Prague
Podcasts
Related Fediverse communities
- c/SQL on programming.dev
- #sql on Mastodon
- #postgresql on Mastodon
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
A few things off the top of my head in order of importance:
How frequently do you
VACCUM
the database? Have you triedVACCUM
ing a few of times over a 5 min span & see if there are changes to the disk I/O aftewards?I've got no idea how Lemmy works but "seeding content", to my mind, possibly means a lot of
INSERT
/UPDATE
s. Is that correct? If yes, there's a chance you may be thrashing your indices & invalidating them too frequently which triggers a lot of rebuilding which could swallow a very large portion of theshared_buffers
. To rule that out, you can simply bumpshared_buffers
(eg 16GB) &effective_cache_size
and see if it makes any difference.Please include a bit more information about PG activity, namely from
pg_stat_activity
,pg_stat_bgwriter
&pg_stat_wal
.You've got quite a high value for
max_connections
- I don't believe that' s the culprit here.And finally, if possible, I'd highly recommend that you take a few minutes & install Prometheus, Prometheus node exporter, Proemetheus PG exporter and Grafana to monitor the state of your deployment. It's way easier to find correlations between data points using the said toolset.
Your Prom PG Exporter is a 404 (I think there is a trailing t in the URL).
Do you have any recommendations for dashboard for grafana/Pg?
As well as statistics that are important?
This is something I'm going to be putting into my deployment, and it's really easy to get overwhelmed with data!
Oh, updated the link 🤦♂️
The stock Grafana dashboard for PG is a good starting point. At least, that's how I started. You really should add new metrics to your dashboard if you really need them as you said.
Don't forget to install node-exporter too. It gives some important bits of info about the PG host. Again the stock dashboard is a decent one to start w/.
VACUUM
ed the DB. I just assumed it does it automatically at regular intervals. VACUUMing manually didn't seem to make any difference and gave me the following error after a few minutes of running on various tables:ERROR: could not resize shared memory segment "/PostgreSQL.1987530338" to 67128672 bytes: No space left on device
I'm not 100% sure where it out of space, but I'm assuming one of the configured buffers since there was still plenty of space left on disk and RAM. I didn't notice any difference in iowait while it was running or after.insert
s, but I see a roughly equal number ofselect
s. I did increaseshared_buffers
andeffective_cache_size
with no effect.I did install Prometheus with PG exporter and Grafana. I'm not a DB expert and certainly not a PostgreSQL expert, but I don't see anything that would indicate an issue. Anything specific you can suggest that I should focus on?
Thanks for all the suggestions!
That means too many chunky parallel maintenance workers are using the memory at the same time (
max_parallel_maintenance_workers
andmaintenance_work_mem
.)VACCUM
ing is a very important part of how PG works; can you try settingmax_parallel_maintenance_workers
to 1 or even 0 (disable parallel altogether) and retry the experiment?That probably rules out the theory of thrashed indices.
Since those stats are cumulative, it's hard to tell anything w/o knowing when was the
SELECT
run. It'd be very helpful if you could run those queries a few times w/ 1min interval and share the output.I'd start w/ the 3 tables I mentioned in the previous point and try to find anomalies esp under different workloads. The rest, I'm afraid, is going to be a bit of an investigation and detective work.
If you like, you can give me access to the Grafana dashboard so I can take a look and we can take it from there. It's going to be totally free of charge of course as I am quite interested in your problem: it's both a challenge for me and helping a fellow Lemmy user. The only thing I ask is that we report back the results and solution here so that others can benefit from the work.
No problem. PM me an IP (v4 or v6) or an email address (disposable is fine) and I'll reply with a link to access Grafana with above in allow list.