-
Notifications
You must be signed in to change notification settings - Fork 399
Description
I just finished reading the "Backup and Restore" page in the ClickHouse docs. I read it the first time and experienced what a new user would experience.
Issue 1
The beginning of the page shows the syntax of BACKUP and RESTORE statements. The TO and FROM clauses allow writing to / reading from a file, a disk, or S3.
The page then explains writing to / reading from disk or S3 in detail but it provides zero examples of writing to a file.
Issue 2
Sub-section "Parameters" in section "Backup to a local disk" mentions an ASYNC keyword. That keyword is missing in the syntax definition further up on the page.
Issue 3
The same sub-section as in Issue 2 mentions the PARTITIONS clause. It is weird to mention it in the section about "Backups to local disk". The clause is independent of the backup location and it should be explained in a general section.
Issue 4
The syntax definition at the begining of the page includes this part:
TO|FROM File('<path>/<filename>') | Disk('<disk_name>', '<path>/') | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Secret access key>')
[SETTINGS base_backup = File('<path>/<filename>') | Disk(...) | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Secret access key>')]
The reader is confused why some settings (the ones starting with Disk and S3 look exactly like backup destinations. Is intented?
EDIT: I am wrong, sorry.
Issue 5
Virtually all SETTINGS from section Backup to local disk, sub-section Parameters are missing in the syntax definition.
Issue 6
SETTING id: Talks about what happens if there is already a running operation with the same id. So that corresponds to runtime behavior of the id. What is not clear is if the ID is persisted (becomes part of the backup)
Issue 7
SETTING compression_method: Does not enumerate or link possible compression methods. Users need to guess.
Issue 8
SETTING compression_method does not mention the default compression method. It also does not mention if users should mess with this setting at all and what the tradeoffs are.
Issue 9
SETTING compression_method makes a mention of compression_level but it is not clear if that's a separate setting or not, what the default is, and if users should tune it.
Issue 10
The SETTING documentation is currently part of the "Backup to a local disk" section. This seems wrong. Some settings seem generic (e.g. "id", "compression_method" and others). They should be in a generic section. Some settings are S3 or Azure-specific. They should be in the corresponding settings.
Issue 11
SETTING password: could mention the encryption algorithm so users can de-crypt an encrypted backup with external tools if necessary.
Issue 12
SETTING base_backup: just reading the explanation, users will not understand what this setting means
Issue 13
Same problem as issue 12 for SETTING use_same_s3_credentials_for_base_backup.
Issue 14
Same problem as issue 12 for SETTING use_same_password_for_base_backup.
Issue 15
SETTING storage_policy ideally needs an example.
Issue 16
SETTING s3_storage_class needs more explanation and ideally an example.
Issue 17
SETTING azure_attempt_to_create_container. Not clear what "container" means in the description. What bad things will happen if the user sets this to "false"?
Issue 18
There is a sentence "core settings can be used here too". 1. This should read "session settings" 2. Please be more specific, all session settings (makes no sense) or only the ones starting with backup_?
Issue 19
Sub-section "Usage examples" in section "Backup to a local disk" gives this example:
BACKUP TABLE test.table TO Disk('backups', '1.zip')
The reader wonders if other file types than .zip work (if yes, which) and how the zip format is related to SETTING compression (see issue 9). E.g. what happens if both conflict?
Issue 20
There is a note "The above RESTORE would fail if the table test.table contains data, you would have to drop the table in order to test the RESTORE, or use the setting allow_non_empty_tables=true:".
The setting should also be included in the SETTINGS section further up.
Issue 21
Re issue 20, it is not mentioned explicitly if a RESTORE with allow_non_empty_tables = true replaces the existing content or adds to the existing content. Probably the former but please document.
Issue 22
Minor: Sub-section "Assign a password to the backup". That is an awkward way to phrase this. Maybe "Encrypting the backup"?
Issue 23
Sub-section Restore specific partitions talks about restoring partitions 1 and 4 but the example lists partitions 2 and 3.
Issue 24
Sub-section Backups as tar archives says "Backups can also be stored as tar archives.". It does not say why one would do that (compared to zip files).
Also, there is a similar problem as in section 19 here.
Issue 24a
The reader wonders why backups are/should be compressed at all. Do they not export a binary dump of the MergeTree table format which is already compressed? It would be nice to explain how the backup format looks like (also for introspection/debugging purposes).
Issue 25
Sub-section Check the status of backups mentions system table system.backups.
Problems:
- please add a link to the official documentation of
system.backups. - when you do that you will notice that there is no such documentation (see here), so please add that first.
- The sub-section coudl be renamed to "Administration and Troubleshooting"
Issue 22
Sub-section Create a base (initial) backup is part of section Configuring BACKUP/RESTORE to use an S3 Endpoint but the sub-section is generic (I assume) and should be moved elsewhere. If incremental backups are really S3-specific, then please mention this explicitly.
Issue 23
Before discussing how to do incremental backups, there needs to be first a beginner-level explanation of the difference between full and incremental backups. What are the pros and cons each?
Issue 24
Minor: Please rename example directories my_backup and my_incremental to base_backup and incremental_backup.
Issue 25
Section BACKUP/RESTORE Using an S3 Disk. Please link adding a file to /etc/clickhouse-server/config.d to https://clickhouse.com/docs/operations/configuration-files
Issue 26
I feel that section "BACKUP/RESTORE Using an S3 Disk" should really be a sub-section of section "Backup to a local disk" (which we could rename to "Backup to disk"). The point is that in both cases a disk is used as destination.
Issue 27
There is a note "If your tables are backed by S3 storage and types of the disks are different". Not clear to the average user what "types of the disks" mean here. Is this <type> in the disk configuration or something else?
Issue 28
Section "Settings to disallow concurrent backup/restore" talks about server setting allow_concurrent_backups. Looking at programs/server/config.xml, there are at least two more server settings related to backups. They are documented in https://clickhouse.com/docs/operations/server-configuration-parameters/settings#backups already. Please link them from the backup overview page.
Issue 29
Related to that, there are more backup-related server settings outside XML tag <backup>, for example max_backup_bandwidth_for_server (here). It will make sense to mention or link them as well.
Issue 30
Minor: section "Configuring BACKUP/RESTORE to use an AzureBlobStorage Endpoint" was appended to the end of the document. It should be moved up, after the backup-to-S3 section.