this post was submitted on 10 Nov 2023
171 points (97.8% liked)

Selfhosted

40006 readers
720 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

So, I moved my nextcloud directory from a local SATA drive to a NFS mount from a nvme array on a 10G network

"I just need to change /docker/nextcloud to /mnt/nfs/nextcloud in the docker-compose.yml, what's the issue, i do it live" - i tell myself

So i stop the container, copy /docker/nextcloud to /mnt/nfs/nextcloud, then edit the docker-compose.yml... and.... because I'm doing it during a phone call without paying too much attention i change the main directory to /docker

I rebuild the container and I immediately hear a flood of telegram notifications from my uptime-kuma bot.... oh oh...

Looks like the nextcloud docker image has an initialization script that if it doesn't find the files in the directory, it will delete everything and install a fresh copy of nextcloud... so it deleted everything on my server

Luckily i had a very recent full borg backup and i'm restoring it (i kinda love-hate borg, i always forget the restore commands when in panic and the docs are a bit cryptic for me)

Lessons learned:

  1. always double check everything

  2. offsite backups are a must (if i accidentally wrote / as path, i would have lost also the borg backups!)

  3. offsite backups should not be permanently mounted, otherwise they would have been wiped as well

  4. learn how to use and schedule filesystem snapshots, so the recovery wouldn't take ages like it's taking right now (2+ hours and i'm not even half way...)

all 38 comments
sorted by: hot top controversial new old
[–] Octavius@lemmy.world 46 points 1 year ago* (last edited 1 year ago) (1 children)
  1. Test your backup/restore procedures regularly. An untested backup is as good as no backup.
[–] Moonrise2473@feddit.it 25 points 1 year ago

yes i should keep a text document with the recovery plan, with all the commands that i have to type, on my nextcloud. Oh, wait... :D

[–] plague_sapiens@lemmy.world 33 points 1 year ago* (last edited 1 year ago) (2 children)

Some years ago, being a linux noob, I have created a VM to setup aBitcoin Lightning node. The blockchain is huge and my idea was to passthrough a 2 TB (/dev/sdc). Had to restart my homeserver because of some hoste settings I've changed. Didn't see that sdc changed to sda and sdb (8TB fully encrpyted drive with my smb shares on it(seperate VM) to sdc. So far no problem. Because I didnt't knew that the device names changed, I started the initilization process which formats the passthrouged HDD. Oh boy, when I heared the 8TB HDD spin up and doing it's thing, the 2 TB HDD was still in spin-down, I panicked and shut down the server. End of story, 8 TB data was unrecoverable (lost all of my photos since I was a kid (~100k), lots of redownloadable stuff, gladly everything sensitive was backuped, like private seeds, work stuff, docuements, ...) Never use /dev/sdX device paths, use UUIDs. They exist for a reason.

[–] Bakkoda@sh.itjust.works 41 points 1 year ago* (last edited 1 year ago) (2 children)

Never use /dev/sdX device paths, use UUIDs. They exist for a reason.

This is absolutely fantastic advice.

[–] yiliu@informis.land 15 points 1 year ago

You can label your devices. When formatting, do mkfs.ext4 -l my-descriptive-name /dev/whatever. Now, refer to it exclusively by /dev/disk/by-label/my-descriptive-name. Much harder to mix up home and swap than sdc2 and sdc3 (or, for that matter, two UUIDs).

[–] plague_sapiens@lemmy.world 3 points 1 year ago* (last edited 1 year ago)

That and permissions are likely the main problem, dependencies are likely the next xD

[–] AnUnusualRelic@lemmy.world 6 points 1 year ago (1 children)

We all went through some educational episodes like yours.

Wisdom has to be earned the hard way. If we're lucky, we're just given a good scare.

[–] plague_sapiens@lemmy.world 2 points 1 year ago

Wise words!

[–] Moonrise2473@feddit.it 26 points 1 year ago (1 children)

If anyone else is reading this in the future:

After 8 hours the backup is restored (of course i stored it on a wd green...) but then nextcloud gave error 500 with lots of errors like:

Doctrine\DBAL\Exception: Failed to connect to the database: An exception occurred in the driver: SQLSTATE[HY000] [1129] Host '172.26.0.1' is blocked because of many connection errors; unblock with 'mariadb-admin flush-hosts' in /var/www/html....

DON'T PANIC. You didn't screw up the restore!

Just access your database with heidisql or your favorite tool and query FLUSH HOSTS;

[–] WhyAUsername_1@lemmy.world 22 points 1 year ago (1 children)

if any one else reading this in the future

My guy just documenting for his future self, if it ever happens again.

Haha

Been there 🤣

[–] computergeek125@lemmy.world 8 points 1 year ago

Can't tell you how many times I've googled things and found my own posts and bug reports.

[–] guitarsarereal@sh.itjust.works 8 points 1 year ago (2 children)

Come back after you rm -rf / or remove glibc, you whipper-snapper! shakes cane

[–] dandroid@dandroid.app 10 points 1 year ago

I watched a coworker run rm -rf * from / as root the other day. He started wondering why things weren't working. I told him what he just did, but he didn't get it at all. Luckily it was a VM that could be recreated from a template. He probably lost 30 minutes of time. But it could have been waaaay worse if it wasn't a disposable VM.

[–] Pete90@feddit.de 4 points 1 year ago (2 children)

I did that when I started working with Linux. I thought / meant the current directory, boy was I wrong!

Hey, it can be the current directory!

[–] 4am@lemm.ee 7 points 1 year ago (2 children)
  1. Always run prod services in a VM or LXC
  2. Snapshot before touching anything

Fucking up in EZ mode just becomes an hour wasted.

Having full backups is good too, of course.

[–] Pete90@feddit.de 1 points 1 year ago

I'm currently setting up proxmox just for that. Since I'm still quite new to self hosting, I fuck up from time to time. Deleted my root file system once. Updated Nginx proxy manager and took down my services with it. I once fucked up iptables, scary stuff.

In the future, it'll be one click and everything works again. It's so easy on novices, once you get everything going.

[–] nik282000@lemmy.ca 1 points 1 year ago (1 children)

I ham-fistedly use LXC to keep my services separate and out of dependancy hell, but would you go as far as putting docker run services in them as well just to keep them away from the host?

[–] LufyCZ@lemmy.world 2 points 1 year ago (1 children)

I do that, each separate docker stack has its own unprivileged LXC as a base

[–] nik282000@lemmy.ca 1 points 1 year ago (1 children)
[–] 4am@lemm.ee 2 points 1 year ago

Be aware that, in the past anyway, Docker didn’t like some storage mediums when running in LXC (I think there are [were?] issues if you snapshot the LXC image on ZFS and you’re using the Overlay2 driver for Docker), and that you could often find issues with networking that way as well (might be a problem if you are trying to cluster/swarm between multiple LXCs?). For those reasons I’ve kept all my Docker stuff in kvm rather than LXC, I wasn’t experienced enough to untangle it all.

[–] TORFdot0@lemmy.world 6 points 1 year ago (1 children)

I did the exact same thing 3 or so years ago. Thankfully I already had a backup but it was a bit nerve wracking to log in to next cloud and it was empty and then browsing the mount and having it also be empty

[–] Moonrise2473@feddit.it 7 points 1 year ago

there's user error in this case, but imho it's a bug that the initialization scripts deletes hundreds of gb without any warning or an override command. Files weren't even owned by www-data! Ok to copy new install files, but not wipe clean...

if one day some web exploit manages to delete/rename/move the file that the script detects as "installation done", it could lead to a worldwide massive data loss when the server reboots

[–] lemmyvore@feddit.nl 5 points 1 year ago (2 children)

Welp that's it, I'm never using Nextcloud.

[–] quackers@lemmy.blahaj.zone 9 points 1 year ago

yeah thats what im taking away from this too.. you dont just rm -rf shit in any application without some good ass verification from the user

[–] anzo@programming.dev 1 points 1 year ago (2 children)

OP said he's been using a Docker image that it's "official by Docker", and not the AIO image official by Nextcloud. The issue here is a random contributor on GitHub's docker organization. Afaik those images are not carefully tested, docker, as many floss organizations rely primarily on contributors, and plenty of these are amateurs or students trying to build a profile..

[–] lemmyvore@feddit.nl 1 points 1 year ago

The script that killed OP's files (entrypoint.sh) also exists in the official Nextcloud AIO image, and the offending line is there as well:

rsync -rlD --delete --exclude-from=/upgrade.exclude /usr/src/nextcloud/ /var/www/html/

I believe the --delete option is the problem here, it will delete all files in the target dir that aren't in the source dir.

Ironically, the script even has a directory_empty function it could use to double-check the target dir, but it doesn't use it for this particular dir. 😆

So, bottom line, a Nextcloud install will wipe out the target dir if you're not careful and I stand by my decision to not touch it with a ten-foot pole.

[–] lemmyvore@feddit.nl 1 points 1 year ago

On an unrelated note: yeah it's confusing that the official Nextcloud AIO docker image is not on Docker Hub and what you get when you search for nextcloud is a "Docker official image" that's actually community-maintained. But as I said in my other comment in this particular case the problem exists in both images.

[–] piet@feddit.de 4 points 1 year ago

Using snapshots on a copy-on-write filesystem such as zfs or btrfs is actually a very good idea. There exist auto-snapshot services that are quite easy to set up and take snapshots with different granularity and maximum number of kept snapshots e.g. every 15m, hour, day, week.

Please note that even snapshots and RAID never replace an off-site backup. When setting up Nextcloud I was even so paranoid and configured the backups to be pulled by the remote machine where they will be stored (and the Nextcloud machine does not even have credentials to access it).

[–] TechAdmin@lemmy.world 4 points 1 year ago (2 children)

Was it the official container image or 3rd party? Whichever it was, they should get notified so that init script can get fixed to prevent similar happening to others.

[–] Moonrise2473@feddit.it 2 points 1 year ago* (last edited 1 year ago)

Official image

Edit: official but from docker, not official from nextcloud, because I don't like AIO images, I like having everything separate

Edit 2: the documentation says to use named docker volumes. I don't like using volumes because I feel they're harder to backup, I want individual file control so I used mount points. Because they're assuming that everyone is using named volumes, they assume they can wipe without problems. But they don't specify to avoid using mount points and/or that's dangerous because of those assumptions.

[–] lemmyvore@feddit.nl 1 points 1 year ago

3rd party, but the official image will do the same (rsync --delete). The 3rd party project has an issue open for it. I couldn't find a similar issue for the AIO image (but maybe I didn't search for the right thing).

[–] Decronym@lemmy.decronym.xyz 2 points 1 year ago* (last edited 8 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
IP Internet Protocol
LXC Linux Containers
RAID Redundant Array of Independent Disks for mass storage

3 acronyms in this thread; the most compressed thread commented on today has 6 acronyms.

[Thread #273 for this sub, first seen 11th Nov 2023, 02:20] [FAQ] [Full list] [Contact] [Source code]

[–] JC1@lemmy.ca 1 points 1 year ago

The worst I did is wanting to replace the WAN interface on my Opnsense router. I didn't check properly and replaced my LAN interface instead, rendering the router inaccessible and fucking up my network. Luckily, its a VM on proxmox that was still accessible from IP. I just opened a console to the VM and found out that the whole configuration is in a file. Also, a copy is saved with every configuration change. I just found the right one to restore and voilà! My network was back up.

[–] raldone01@lemmy.world 1 points 1 year ago

Borg supports a remote append only mode but you need a borg client on the remote machine.

[–] dartanjinn@lemm.ee 1 points 1 year ago

This is why I use OMV and Nextcloud. A daily backup job duplicates everything to OMV. A weekly OMV backup job goes into Skiff drive. Fool me once...