Observability

18 readers
1 users here now

๐Ÿ“ข ย  All about Observability (o11y), Prometheus, Grafana, et al:


๐Ÿ—จ ย  Join the chatter in


โ›” ย  Hate speech, bigotry and NSFW content will not be tolerated.


founded 1 year ago
MODERATORS
1
 
 

cross-posted from: https://lemmy.ml/post/7353834

lemmy-synapse is a light-weight observability and monitoring stack for Lemmy servers.


Using Prometheus and Grafana, it allows the admins to visualise and query the stats of their instance. v1.0.0 comes out of the box with 3 detailed dashboards:

  • Host stats (CPU, RAM, disk, network, ...)
  • PostgreSQL stats (connections, locks, transations, queries, ...)
  • Docker stats (container CPU, RAM, disk, network, OOM signals, ...)

It runs as Docker compose cluster alongside the Lemmy cluster and does not require any changes to it in most cases. Uninstalling lemmy-synapse is as easy as tearing down its cluster and deleting its installation directory.


Got questions/feedback? Pray drop a line:

2
 
 

I'm using Grafana for one of my hobby projects which is also deployed to a public-facing server.

I am the only user of Grafana as it is supposed to be read-only for anonymous access.

My current workflow is:

  1. Run Grafana locally.
  2. Make changes to local dashboards, data-sources, ...
  3. Stop local Grafana.
  4. Stop remote Grafana.
  5. Copy local grafana.db to the remote machine.
  6. Start remote Grafana.
  7. Goto (1)

However this feels terribly inefficient and stupid to my mind ๐Ÿ˜…


To automate parts of this process, I tried gdg and grafana-backup-tool.

I couldn't get the former to work w/ my workflow (local storage) as it barfed at the very start w/ the infamous "invalid cross-device link" Go error.

The latter seems to work but only partially; for example organisations are not exported.


โ“ Given I may switch to PostgreSQL as Grafana's DB in the near future, my question is, what is the best way to automate my process short of stopping Grafana and copying database files.

3
 
 

cross-posted from: https://lemmy.ml/post/5287125

TLDR; The author argues that free-form logging is quite useless/expensive to use. They also argue that structured logging is less effective than tracing b/c of mainly the difficulty of inferring timelines and causality.


I find the arguments very plausible.

In fact I very rarely use logs produced by several services b/c most of the times they just confuse me. The only time that I heavily use logs is troubleshooting a single service and looking at its stdout (or kubectl log.)

However I have very little experience w/ tracing (I've used it in my hobby projects but, obviously, they never represent the reality of complex distributed systems.)

Have you got real world experience w/ tracing in larger systems? Care to share your take on the topic?

4
 
 

Update

Turned out I didn't need to convert any series to gauges at all!

The problem was that I had botched my Prometheus configuration and it wasn't ingesting the probe results properly ๐Ÿคฆโ€โ™‚๏ธ Once I fixed that, I got all the details I needed.

For posterity you can view lemmy-meter's configuration on github.


Original post

I'm using blackbox_exporter to monitor a dozen of websites' performance. And that is working just fine for measuring RTT and error rates.

I'm thinking about creating a single gauge for each website indicating whether it is up or down.


I haven't been able to find any convincing resource as to if it is mathematically correct to convert such series to guages/counters - let alone how to do that.

So my questions are

  • Have I missed a relevant option in blackbox_exporter configurations?
  • Do you recommend converting series to gauges/counters? If yes, can you point me to a resource so that I can educate myself on how to do it?