All we need is an easy explanation of the problem, so here it is.
I want to set up a database in a high durability set-up on Azure. I’ve previously relied on DB-as-a-service offerings, but can’t do that in this case, so I’d like your feedback on the plan below. Is this enough to ensure reliable storage of data?
- An Azure Web App takes in metric data from the web, does some minor processing and sampling, and sends the data in batches to VM2.
- VM2 runs the Clickhouse database, and stores data on an Azure Managed Disk
- Some periodical job takes snapshots of the disk using Clickhouse built-in backup functionality and stores them to cold storage
The periodical backup is meant to mitigate human error, i.e. accidentally running "DROP TABLE xx" on the wrong data.
The big question is if managed disks are an acceptable substitute for database replication, to ensure data durability. Azure Managed Disks are advertised as being very durable forms of storage, with built in triple-redundant replication. They are advertised as good for database use. It seems that this should be enough to take away any concerns of data loss due to hardware failure. Is this correct? Do you see any potential problems with this?
The recovery plan is that if VM2 fails, some monitoring process catches this and spins up a new VM2 instance attached to the same managed disk. The Web App similarly restarts if it fails.
I understand that this setup isn’t high-availability, if a VM fails there will be some window of time before it is able to store new data. This is acceptable to me. But I want to ensure that data that gets stored will not be lost, i.e. is durably stored with very high probability. Is this enough to ensure that? Do you see any problems?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
But I want to ensure that data that gets stored will not be lost, i.e. is durably stored with very high probability.
Yes. It could happen, but it’s an extremely low-probability event, and there’s many other low-probability events that are more likely. It’s vastly more likely that your data will be lost due to a guest OS driver problems, human error, or malware than Azure would lose data on a disk.
LRS provides at least 99.999999999% (11 nines) durability of objects over a given year.
And whatever that means, you can bet that a lot of things would have to go very wrong to lose a disk.
If you are happy recovering from backups for those other scenarios, you should be fine.`
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂