Skip to content

"Backup and Restore" documentation can be improved #3968

@rschu1ze

Description

@rschu1ze

I just finished reading the "Backup and Restore" page in the ClickHouse docs. I read it the first time and experienced what a new user would experience.

Issue 1

The beginning of the page shows the syntax of BACKUP and RESTORE statements. The TO and FROM clauses allow writing to / reading from a file, a disk, or S3.

The page then explains writing to / reading from disk or S3 in detail but it provides zero examples of writing to a file.

Issue 2

Sub-section "Parameters" in section "Backup to a local disk" mentions an ASYNC keyword. That keyword is missing in the syntax definition further up on the page.

Issue 3

The same sub-section as in Issue 2 mentions the PARTITIONS clause. It is weird to mention it in the section about "Backups to local disk". The clause is independent of the backup location and it should be explained in a general section.

Issue 4

The syntax definition at the begining of the page includes this part:

  TO|FROM File('<path>/<filename>') | Disk('<disk_name>', '<path>/') | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Secret access key>')
  [SETTINGS base_backup = File('<path>/<filename>') | Disk(...) | S3('<S3 endpoint>/<path>', '<Access key ID>', '<Secret access key>')]

The reader is confused why some settings (the ones starting with Disk and S3 look exactly like backup destinations. Is intented?

EDIT: I am wrong, sorry.

Issue 5

Virtually all SETTINGS from section Backup to local disk, sub-section Parameters are missing in the syntax definition.

Issue 6

SETTING id: Talks about what happens if there is already a running operation with the same id. So that corresponds to runtime behavior of the id. What is not clear is if the ID is persisted (becomes part of the backup)

Issue 7

SETTING compression_method: Does not enumerate or link possible compression methods. Users need to guess.

Issue 8

SETTING compression_method does not mention the default compression method. It also does not mention if users should mess with this setting at all and what the tradeoffs are.

Issue 9

SETTING compression_method makes a mention of compression_level but it is not clear if that's a separate setting or not, what the default is, and if users should tune it.

Issue 10

The SETTING documentation is currently part of the "Backup to a local disk" section. This seems wrong. Some settings seem generic (e.g. "id", "compression_method" and others). They should be in a generic section. Some settings are S3 or Azure-specific. They should be in the corresponding settings.

Issue 11

SETTING password: could mention the encryption algorithm so users can de-crypt an encrypted backup with external tools if necessary.

Issue 12

SETTING base_backup: just reading the explanation, users will not understand what this setting means

Issue 13

Same problem as issue 12 for SETTING use_same_s3_credentials_for_base_backup.

Issue 14

Same problem as issue 12 for SETTING use_same_password_for_base_backup.

Issue 15

SETTING storage_policy ideally needs an example.

Issue 16

SETTING s3_storage_class needs more explanation and ideally an example.

Issue 17

SETTING azure_attempt_to_create_container. Not clear what "container" means in the description. What bad things will happen if the user sets this to "false"?

Issue 18

There is a sentence "core settings can be used here too". 1. This should read "session settings" 2. Please be more specific, all session settings (makes no sense) or only the ones starting with backup_?

Issue 19

Sub-section "Usage examples" in section "Backup to a local disk" gives this example:

BACKUP TABLE test.table TO Disk('backups', '1.zip')

The reader wonders if other file types than .zip work (if yes, which) and how the zip format is related to SETTING compression (see issue 9). E.g. what happens if both conflict?

Issue 20

There is a note "The above RESTORE would fail if the table test.table contains data, you would have to drop the table in order to test the RESTORE, or use the setting allow_non_empty_tables=true:".

The setting should also be included in the SETTINGS section further up.

Issue 21

Re issue 20, it is not mentioned explicitly if a RESTORE with allow_non_empty_tables = true replaces the existing content or adds to the existing content. Probably the former but please document.

Issue 22

Minor: Sub-section "Assign a password to the backup". That is an awkward way to phrase this. Maybe "Encrypting the backup"?

Issue 23

Sub-section Restore specific partitions talks about restoring partitions 1 and 4 but the example lists partitions 2 and 3.

Issue 24

Sub-section Backups as tar archives says "Backups can also be stored as tar archives.". It does not say why one would do that (compared to zip files).

Also, there is a similar problem as in section 19 here.

Issue 24a

The reader wonders why backups are/should be compressed at all. Do they not export a binary dump of the MergeTree table format which is already compressed? It would be nice to explain how the backup format looks like (also for introspection/debugging purposes).

Issue 25

Sub-section Check the status of backups mentions system table system.backups.

Problems:

  • please add a link to the official documentation of system.backups.
  • when you do that you will notice that there is no such documentation (see here), so please add that first.
  • The sub-section coudl be renamed to "Administration and Troubleshooting"

Issue 22

Sub-section Create a base (initial) backup is part of section Configuring BACKUP/RESTORE to use an S3 Endpoint but the sub-section is generic (I assume) and should be moved elsewhere. If incremental backups are really S3-specific, then please mention this explicitly.

Issue 23

Before discussing how to do incremental backups, there needs to be first a beginner-level explanation of the difference between full and incremental backups. What are the pros and cons each?

Issue 24

Minor: Please rename example directories my_backup and my_incremental to base_backup and incremental_backup.

Issue 25

Section BACKUP/RESTORE Using an S3 Disk. Please link adding a file to /etc/clickhouse-server/config.d to https://clickhouse.com/docs/operations/configuration-files

Issue 26

I feel that section "BACKUP/RESTORE Using an S3 Disk" should really be a sub-section of section "Backup to a local disk" (which we could rename to "Backup to disk"). The point is that in both cases a disk is used as destination.

Issue 27

There is a note "If your tables are backed by S3 storage and types of the disks are different". Not clear to the average user what "types of the disks" mean here. Is this <type> in the disk configuration or something else?

Issue 28

Section "Settings to disallow concurrent backup/restore" talks about server setting allow_concurrent_backups. Looking at programs/server/config.xml, there are at least two more server settings related to backups. They are documented in https://clickhouse.com/docs/operations/server-configuration-parameters/settings#backups already. Please link them from the backup overview page.

Issue 29

Related to that, there are more backup-related server settings outside XML tag <backup>, for example max_backup_bandwidth_for_server (here). It will make sense to mention or link them as well.

Issue 30

Minor: section "Configuring BACKUP/RESTORE to use an AzureBlobStorage Endpoint" was appended to the end of the document. It should be moved up, after the backup-to-S3 section.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions