this post was submitted on 01 Aug 2023
51 points (98.1% liked)

Selfhosted

39980 readers
720 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Note: It seems my original post from last week didn't get posted on lemmy.world from kbin (I can't seem to find it) so I'm reposting it. Apologies to those who may have already seen this.

I'm looking to deploy some form of monitoring across my selhosted servers and I'm a bit confused about the different options.

I have a small network of three machines that I would like to monitor. I am not looking for a solution that lets me monitor tens, hundreds, or thousands of nodes. Furthermore, I am more interested in being able to observe metrics for each node individually rather than in aggregate. Each of these machines performs a different task so aggregate metrics from these machines are not particularly meaningful. However, collecting all the metrics centrally so that I can have a single dashboard to view them all in one convenient place is definitely something I would like.

With that said, I have been trying to understand the different (popular) options that are available and I would like to hear what the community's experience is with these options and if anybody has any advice on any of these in light of my requirements above.

Prometheus seems like the default go-to for monitoring. This would require deploying a node_exporter on each node, a prometheus service, and a grafana dashboard. That's all fine, I can do that. However, from all that I'm reading it doesn't seem like Prometheus is optimised for my use case of monitoring each node individually. I'm sure it's possible, but I'm concerned that because this is not what it's meant for, it would take me ages to set it up such that I'm happy with it.

Netdata seems like a comprehensive single-device monitoring solution. It also appears that it is possible to run your own registry to help with distributed monitoring. Not gonna lie, the netdata dashboard looks slick. An important additional advantage is that it comes packaged on Debian (all my machines run Debian). However, it looks like it does not store the metrics for very long. To solve that I could also set up InfluxDB and Grafana for long-term metrics. I could use Prometheus instead of InfluxDB in this arrangement, but I'm more likely to deploy a bunch of IoT devices than I am to deploy servers needing monitoring which means InfluxDB is a bit more future-proof for me as it could be reused for IoT data.

Cockpit is another single-device solution which additionally provides direct control of the system. The direct control is probably not so much of a plus as then I would never let Cockpit be accessible from outside my home network whereas I wouldn't mind that so much for dashboards with read-only data (still behind some authentication of course). It's also probably not built for monitoring specifically, but I included this in the list in case somebody has something interesting to say about it.

What's everybody's experience with the above solutions and does anybody have advice specific to my situation? I'm currently leaning to netdata with my own registry at first and later add InfluxDB and Grafana for long-term metrics.

you are viewing a single comment's thread
view the rest of the comments
[–] dditty@lemmy.world 2 points 1 year ago* (last edited 1 year ago) (1 children)

I'm running netdata on each of my servers and it has every feature I need. If u choose netdata, make sure not to install the nightly builds since they get updated all the time and sometimes break features. One annoying thing with netdata is you have to pay a subscription for the option to disable individual alert types. I have a nearly full hard drive and there's an alert for that which won't go away. Same thing for temporary inbound packet drops which seems to happen everytime one particular Plex user forcibly transcodes content (they're old and remote and won't change their Plex client settings 😑). Each error they send you an email.

[–] vegetaaaaaaa@lemmy.world 3 points 1 year ago (1 children)

you have to pay a subscription for the option to disable individual alert types

Never heard of that. You can disable individual alarms by setting to: silent in the relevant health.d configuration file (e.g. health.d/disks.conf). This is exactly what I do for packet drops.

[–] dditty@lemmy.world 1 points 1 year ago

Ooh sweet thnx 4 the tip - will try this later! I was trying via the web dashboard which I'm pretty sure requires a subscription