Stop services during backup when using snapshots?
https://lemmy.ca/post/26151124
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
Stop the whole VM during snapshots.
=> More informations about this toot | More toots from solrize@lemmy.world
Not a VM. Consider the service just a service running on the host OS.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
This is one of the reasons Docker exists.
=> More informations about this toot | More toots from null@slrpnk.net
And I’m using Docker, but Docker isn’t helping with the stopping/running conundrum.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
It should work that way. If you use the recommended Docker Compose scripts, you’ll notice that only a few volumes are mounted to store your data. These volumes don’t include information about running instances. If you take snapshots of these volumes, back them up, remove the containers and volumes, then restore the data and rerun the Compose scripts, you should be right where you left off, without any remnants from previous processes. That’s a pro of container process isolation
=> More informations about this toot | More toots from Hansie211@lemmy.world
Why not?
=> More informations about this toot | More toots from null@slrpnk.net
Docker doesn’t change the relationship between a running process and its data. At the end of the day you have a process running in memory that opens, reads, writes and closes files that reside on some filesystem. What happens with the files when the process is killed instantly and what happens when it’s started afterwards and it re-reads the files doesn’t change based on where the files reside or where the process runs. You could run it in docker, in a VM, on Linux, on Unix, or Windows. You could store the files in a docker volume, you could mount them in, have them on NFS, in the end they’re available to the process via filesystem calls. In the end the effects are limited to the interactions between the process and its data.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
docker stop container
Make your snapshot
docker start container
What am I missing?
=> More informations about this toot | More toots from null@slrpnk.net
That’s the trivial scenario that we know won’t fail - stopping the service during snapshot. The scenario that I was asking people’s opinions on is not stopping the service during snapshot and what restoring from such backup would mean.
Let me contrast the two by completing your example:
Now here’s the interesting scenario:
Notice that in the second scenario we are not stopping the container. The snapshot is taken while it’s live. This means databases and other files are open, likely actively being written to. Some files are likely only partially written. There are also likely various temporary lock files present. All of that is stored in the snapshot. When we restore from this snapshot and start the service it will see all of that. Contrast this with the trivial scenario when the service is stopped. Upon stopping it, all data is synced to disk, inflight database operations are completed or canceled, partial writes are completed or discarded, lock files are cleaned up. When we restore from such a snapshot and start the service, it will “think” it just starts from a clean stop, nothing extra to do. In the live snapshot scenario the service will have to do cleanup. For example it will have to decide what to do with existing lock files. Are they there because there’s another instance of the service is running and writing to the database or did someone kill its process before it has the chance to go through its shutdown cleanup. In one case it might have to log an error and quit. In the other it would have to remove the lock files. And so on and so forth.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
Oh I see – you’re asking a hypothetical.
The simple answer is that it’s a bad idea to take snapshots of running databases because at best they could be missing info and at worst they can corrupt.
The short answer: Don’t.
=> More informations about this toot | More toots from null@slrpnk.net
I don’t bother stopping services during backup, each service is contained to a single LVM volume, so snapshotting is exactly the same as yanking the plug. I haven’t had any issues yet, either with actual power failures or data restores.
=> More informations about this toot | More toots from butitsnotme@lemmy.world
And this implies you have actually tested those backups right?
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
Yes, I have. I should probsbly test them again though, as it’s been a while, and Immich at least has had many potentially significant changes.
LVM snapshots are virtually instant, and there is no merge operation, so deleting the snapshot is also virtually instant. The way it works is by creating a new space where the difference from the main volume are written, so each time the application writes to the main volume the old block will be copied to the snapshot first. This does mean that disk performance will be somewhat lower than without snapshots, however I’ve not really noticed any practical implications. (I believe LVM typically creates my snapshots on a different physical disk from where the main volume lives though.)
You can my backup script here.
=> More informations about this toot | More toots from butitsnotme@lemmy.world
Oh interesting. I was under the impression that deletion in LVM was actually merging which took some time but I guess not. Thanks for the info!
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
I ran into a similar problem with snapshots of a forum and email server – if there are scheduled emails when you take the snapshot they get sent out again if you create a new test server from the snapshot. And similarly for the forum.
I’m not sure what the solution is either.
=> More informations about this toot | More toots from MaximilianKohler@lemmy.world
Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:
Fewer Letters
More Letters
LVM
(Linux) Logical Volume Manager for filesystem mapping
SMTP
Simple Mail Transfer Protocol
ZFS
Solaris/Linux filesystem focusing on data integrity
[Thread #902 for this sub, first seen 1st Aug 2024, 21:25]
[FAQ] [Full list] [Contact] [Source code]
=> More informations about this toot | More toots from Decronym@lemmy.decronym.xyz
You start the backup, db is backed up, now image assets are being copied. That could take an hour.
For the initial backup maybe, but subsequent incrementals should only take a minute or two.
I don’t bother stopping services, it’s too time intensive to deal with setting that up.
I’ve yet to meet any service that can’t recover smoothly from a kill -9 equivalent, any that did sure wouldn’t be in my list of stuff I run anymore.
=> More informations about this toot | More toots from MangoPenguin@lemmy.blahaj.zone
It depends on the dataset. If the dataset itself is very large, just walking it to figure out what the incremental part is can take a while on spinning disks. Concrete example - Immich instance with 600GB of data, hundreds of thousands of files, sitting on a 5-disk RAIDz2 of 7200RPM disks. Just walking the directory structure and getting the ctimes takes over an hour. Suboptimal hardware, suboptimal workload.
I’ve yet to meet any service that can’t recover smoothly from a kill -9 equivalent, any that did sure wouldn’t be in my list of stuff I run anymore.
My thoughts precisely.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
Oooh yeah I can imagine RAIDz2 on top of using spinning disks would be very slow, especially with access times enabled on ZFS.
What backup software are you using? I’ve found restic to be reasonably fast.
=> More informations about this toot | More toots from MangoPenguin@lemmy.blahaj.zone
Currently duplicity but rsync took similar amount of time. The incremental change is typically tens or hundreds of files, hundreds of megabytes total. They take very little to transfer.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
Yeah sounds like snapshots is the way to go!
=> More informations about this toot | More toots from MangoPenguin@lemmy.blahaj.zone
Check “green blue” deployment strategy. This is done by many businesses, where an interrupted service might mean losing a sale, or a client forever… I tried it sometime witj Nginx but it was more pain than gain (for my personal use)
=> More informations about this toot | More toots from anzo@programming.dev
Good suggestion. I’ve done blue-green professionally with services that are built to have high availability and in cloud environments. If I were to actually setup some form of that, I’d probably use ZFS send/rcv to keep a backup server always 15 minutes behind and ready to go. I wouldn’t deal with file-based backups that take an hour to just walk the dataset to just figure out what’s new. 😅 Probably not happening for now.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
Modern image snapshot backups stop the service for av instant, creates a local snapshot to backup while the service runs a Delta then you apply the Delta to the running image
=> More informations about this toot | More toots from Evotech@lemmy.world
When you say stopping the service for an instant you must mean pausing its execution or at least its IO. Actually stopping the service can’t be guaranteed to take an instant. It can’t be guaranteed to start in an instant. Worst of all, it can’t even be guaranteed that it’ll be able to start again. When I say stopping I mean sysemctl stop or docker stop or pkill, in other words delivering an orderly, graceful kill signal and waiting for the process/es to stop execution.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
Correct, just pausing it on the underlying platform
=> More informations about this toot | More toots from Evotech@lemmy.world
If you’re worried a out a database being corrupt, I’d recommend doing an actual backup dump of the database and not only backing up the raw disk files for it.
That should help provide some consistency. Of course it takes longer too if it’s a big db
=> More informations about this toot | More toots from johntash@eviltoast.org
I dump the db too.
With that said if backing up the raw files of a db while the service is stopped can produce a bad backup, I think we have bigger problems. That’s because restoring the raw files and starting the service is functionally equivalent to just starting the service with its existing raw files. If that could cause a problem then the service can’t be trusted to be stopped and restarted either. Am I wrong?
=> More informations about this toot | More toots from avidamoeba@lemmy.ca
I was talking about dumping the database as an alternative to backing up the raw database files without stopping the database first. Taking a filesystem-level snapshot of the raw database without stopping the database first also isn’t guaranteed to be consistent. Most databases are fairly resilient now though and can recover themselves even if the raw files aren’t completely consistent.
Stopping the database first and then backing up the raw files should be fine.
The important thing is to test restoring :)
=> More informations about this toot | More toots from johntash@eviltoast.org
Now this makes perfect sense.
=> More informations about this toot | More toots from avidamoeba@lemmy.ca This content has been proxied by September (ba2dc).Proxy Information
text/gemini