JohnSnowLabs · maziyarpanahi · Apr 24, 2022 · Feb 21, 2022 · Feb 21, 2022 · Feb 21, 2022
diff --git a/docs/assets/images/annotation_lab/3.1.0/backupRestoreUI.png b/docs/assets/images/annotation_lab/3.1.0/backupRestoreUI.png
diff --git a/docs/en/alab/install.md b/docs/en/alab/install.md
@@ -102,46 +102,82 @@ commands inside `annotationlab-installer.sh` and `annotationlab-updater.sh` file
 
 ### Backup and restore
 
-- Backup
+#### Backup
 
 You can enable daily backups by adding several variables with --set option to helm command in `annotationlab-updater.sh`:
 
 ```bash
 backup.enable=true
+backup.files=true
 backup.s3_access_key="<ACCESS_KEY>"
 backup.s3_secret_key="<SECRET_KEY>"
 backup.s3_bucket_fullpath="<FULL_PATH>"
 ```
 
-`<ACCESS_KEY>` - your access key for aws s3 access
-`<SECRET_KEY>` - your secret key for aws s3 access
+`<ACCESS_KEY>` - your access key for AWS S3 access
+`<SECRET_KEY>` - your secret key for AWS S3 access
 `<FULL_PATH>` - full path to your backup in s3 bucket (f.e. s3://example.com/path/to/my/backup/dir)
 
-- Restore
 
-To restore from backup you need new clear installation of Annotation Lab. Do it with `annotationlab-install.sh`.
-Next, you need to download latest backup from your s3 bucket and unpack an archive. There should be 3 sql backup files:
+Notice: Files backups enabled by default. If you don't need to backup files, you have to change
 
 ```bash
-annotationlab.sql
-keycloak.sql
-airflow.sql
+backup.files=true
 ```
-Run commands below to get PostgreSQL passwords.
+to
 
-airflow-postgres password for user `airflow`:
 ```bash
-kubectl get secret -l app.kubernetes.io/name=airflow-postgresql -o jsonpath='{.items[0].data.postgresql-password}' | base64 -d 
+backup.files=false
 ```
-annotationlab-postgres password for user `annotationlab`:
-```bash
-kubectl get secret -l app.kubernetes.io/name=postgresql -o jsonpath='{.items[0].data.postgresql-password}' | base64 -d 
+
+**Configure Backup from the UI**
+
+Backup can also be configured by admin user from the UI. Goto Settings > Backup and set the parameters.
+
+<img class="image image--xl" src="/assets/images/annotation_lab/3.1.0/backupRestoreUI.png" style="width:100%; align:center; box-shadow: 0 3px 6px rgba(0,0,0,0.16), 0 3px 6px rgba(0,0,0,0.23);"/>
+
+
+#### Restore 
+
+**Database**
+
+To restore annotationlab from backup you need new clear installation of annotationlab. Do it with 'annotationlab-install.sh'. Now, download latest backup from your s3 bucket and move and archive to `restore/database/` directory. Next go to the `restore/database/` directory and execute script 'restore_all_databases.sh' with name of your backup archive as argument.
+
+For example:
+
 ```
-keycloak-postgress password for user `keycloak`:
-```bash
-kubectl get secret -l app.kubernetes.io/name=keycloak-postgres -o jsonpath='{.items[0].data.postgresql-password}' | base64 -d 
+cd restore/database/
+sudo ./restore_all_databases.sh 2022-04-14-annotationlab-all-databases.tar.xz
+```
+
+*Notice:* You need `xz` and `bash` installed to execute this script.
+*Notice:* This script works only with backups created by annotationlab backup system.
+*Notice:* Run this scripts with `sudo` command
+
+After database restore complete you can check logs in `restore_log` directory created by restore script.
+
+**Files**
+
+Download your files backup and move it to `restore/files` directory. Go to `restore/files` directory and execute script 'restore_files.sh' with name of your backup archive as argument. For example:
+
+```
+cd restore/files/
+sudo ./restore_files.sh 2022-04-14-annotationlab-files.tar
+```
+
+*Notice:* You need `bash` installed to execute this script.
+
+*Notice:* This script works only with backups created by annotationlab backup system.
+
+*Notice:* Run this scripts with `sudo` command
+
+**Reboot**
+
+After restoring database and files, reboot AnnotationLab:
+
+```
+sudo reboot
 ```
-Now you can restore your databases with `psql`, `pg_restore`, etc.
 
 ## Recommended Configurations
 

diff --git a/docs/en/ocr.md b/docs/en/ocr.md
@@ -18,7 +18,7 @@ Spark OCR is another commercial extension of Spark NLP for optical character rec
 
 
 Spark OCR is built on top of ```Apache Spark``` and offers the following capabilities:
-- Image pre-processing algorithms to improve text recognition results:
+  - Image pre-processing algorithms to improve text recognition results:
   - Adaptive thresholding & denoising
   - Skew detection & correction
   - Adaptive scaling

diff --git a/docs/en/ocr_install.md b/docs/en/ocr_install.md
@@ -20,7 +20,7 @@ Currently, it supports 3.0.*, 2.4.* and 2.3.* versions of Spark.
 It is recommended to have basic knowledge of the framework and a working environment before using Spark OCR. Refer to Spark [documentation](http://spark.apache.org/docs/2.4.4/index.html) to get started with Spark.
 
 
-Spark OCR required:
+Spark OCR requires:
  - Scala 2.11 or 2.12 related to the Spark version
  - Python 3.7 + (in case using PySpark)
 
@@ -47,7 +47,7 @@ You can start a spark REPL with Scala by running in your terminal a spark-shell
 spark-shell --jars ####
 ```
 
-The #### is a secret url only avaliable for license users. If you have purchansed a license but did not receive it please contact us at [email protected].
+The #### is a secret url only available for license users. If you have purchased a license but did not receive it please contact us at [email protected].
 
 </div>
 
@@ -85,7 +85,7 @@ Install python package using pip:
 pip install spark-ocr==1.8.0.spark24 --extra-index-url #### --ignore-installed
 ```
 
-The #### is a secret url only avaliable for license users. If you have purchansed a license but did not receive it please contact us at [email protected].
+The #### is a secret url only available for license users. If you have purchased a license but did not receive it please contact us at [email protected].
 
 </div><div class="h3-box" markdown="1">
 

diff --git a/docs/en/ocr_object_detection.md b/docs/en/ocr_object_detection.md
@@ -15,7 +15,7 @@ sidebar:
 ## ImageHandwrittenDetector
 
 `ImageHandwrittenDetector` is a DL model for detect handwritten text on the image.
-It based on Cascade Region-based CNN network.
+It's based on Cascade Region-based CNN network.
 
 Detector support following labels:
  - 'signature'
@@ -139,8 +139,8 @@ display_images(data, "image_with_regions")
 
 ## ImageTextDetector
 
-`ImageTextDetector` is a DL model for detect text on the image.
-It based on CRAFT network architecture.
+`ImageTextDetector` is a DL model for detecting text on the image.
+It's based on CRAFT network architecture.
 
 
 #### Input Columns

diff --git a/docs/en/ocr_pipeline_components.md b/docs/en/ocr_pipeline_components.md
@@ -33,8 +33,8 @@ Next section describes the transformers that deal with PDF files with the purpos
 {:.table-model-big}
 | Param name | Type | Default | Description |
 | --- | --- | --- | --- |
-| splitPage | bool | true | whether it needed to split document to pages |
-| textStripper | | TextStripperType.PDF_TEXT_STRIPPER | 
+| splitPage | bool | true | Whether it needed to split document to pages |
+| textStripper | | TextStripperType.PDF_TEXT_STRIPPER | Extract unstructured text
 | sort | bool | false | Sort text during extraction with TextStripperType.PDF_LAYOUT_STRIPPER |
 | partitionNum | int| 0 | Force repartition dataframe if set to value more than 0. |
 | onlyPageNum | bool | false | Extract only page numbers. |
@@ -117,8 +117,8 @@ data.select("pagenum", "text").show()
 
 `PdfToImage` renders PDF to an image. To be used with scanned PDF documents.
 Output dataframe contains `total_pages` field with total number of pages.
-For process pdf with big number of pages prefer to split pdf by setting `splitNumBatch` param.
-Number of partitions should be equal number of cores/executors.
+For process pdf with a big number of pages prefer to split pdf by setting `splitNumBatch` param.
+Number of partitions should be equal to number of cores/executors.
 
 ##### Input Columns
 
@@ -228,7 +228,7 @@ column and create multipage PDF document.
 
 **Example:**
 
-Read images and store it as single page PDF documents.
+Read images and store them as single page PDF documents.
 
 
 <div class="tabs-box pt0" markdown="1">
@@ -289,8 +289,8 @@ pdf_df.select("content").show()
 
 ### TextToPdf
 
-`TextToPdf` renders ocr results to PDF document as text layout. Each symbol will render to same position
-with same font size as in original image or PDF.
+`TextToPdf` renders ocr results to PDF document as text layout. Each symbol will render to the same position
+with the same font size as in original image or PDF.
 If dataframe contains few records for same origin path, it groups image by origin
 column and create multipage PDF document.
 
@@ -1088,7 +1088,7 @@ data.select("tables").show()
 
 ### PptToPdf
 
-`PptToPdf` convert PPT and PPTX document to PDF document.
+`PptToPdf` convert PPT and PPTX documents to PDF document.
 
 ##### Input Columns
 
@@ -1364,14 +1364,14 @@ data.select("image").show()
 
 `GPUImageTransformer` allows to run image pre-processing operations on GPU.
 
-It supports following operations:
+It supports the following operations:
 - Scaling
 - Otsu thresholding
 - Huang thresholding
 - Erosion
 - Dilation
 
-`GPUImageTransformer` allows to add few operations. For add  operations need to call
+`GPUImageTransformer` allows to add few operations. To add operations you need to call
 one of the methods with params:
 
 {:.table-model-big}
@@ -1474,7 +1474,7 @@ display_images(result, "transformed_image")
 
 ### ImageBinarizer
 
-`ImageBinarizer` transforms image to binary color schema by threshold.
+`ImageBinarizer` transforms image to binary color schema, based on threshold.
 
 ##### Input Columns
 
@@ -1559,11 +1559,11 @@ data.show()
 ### ImageAdaptiveBinarizer
 
 Supported Methods:
-- OTSU
+- OTSU.  Returns a single intensity threshold that separate pixels into two classes, foreground and background.
 - Gaussian local thresholding. Thresholds the image using a locally adaptive threshold that is computed
  using a local square region centered on each pixel.  The threshold is equal to the gaussian weighted sum 
  of the surrounding pixels times the scale.
-- Sauvola
+- Sauvola. Is a Local thresholding technique that are useful for images where the background is not uniform.
 
 
 #### Input Columns
@@ -2147,12 +2147,12 @@ data.select("path", "noiselevel").show()
 
 **python only**
 
-`ImageRemoveObjects` for remove background objects.
-It support removing:
-- objects less then elements of font with _minSizeFont_ size
-- objects less then _minSizeObject_
-- holes less then _minSizeHole_
-- objects more then _maxSizeObject_
+`ImageRemoveObjects` to remove background objects.
+It supports removing:
+- objects less than elements of font with _minSizeFont_ size
+- objects less than _minSizeObject_
+- holes less than _minSizeHole_
+- objects more than _maxSizeObject_
 
 #### Input Columns
 
@@ -2505,7 +2505,7 @@ data.show()
 
 ### ImageSplitRegions
 
-`ImageSplitRegions` splits image to regions.
+`ImageSplitRegions` splits image into regions.
 
 #### Input Columns
 
@@ -3468,7 +3468,7 @@ Next section describes the extra transformers
 
 ### PositionFinder
 
-`PositionFinder` find position of input text entities in original document.
+`PositionFinder` find the position of input text entities in the original document.
 
 #### Input Columns
 
@@ -3759,7 +3759,7 @@ results.show()
 ### FoundationOneReportParser
 
 `FoundationOneReportParser` is a transformer for parsing FoundationOne reports.
-Current implementation support parsing patient info, genomic, biomarker findings and gene lists
+Current implementation supports parsing patient info, genomic, biomarker findings and gene lists
 from appendix.
 Output format is json.