Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/assets/images/ocr/text_detection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
132 changes: 132 additions & 0 deletions docs/en/ocr_object_detection.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,135 @@ display_images(data, "image_with_regions")
**Output:**

![image](/assets/images/ocr/signature.png)



## ImageTextDetector

`ImageTextDetector` is a DL model for detect text on the image.
It based on CRAFT network architecture.


#### Input Columns

{:.table-model-big}
| Param name | Type | Default | Column Data Description |
| --- | --- | --- | --- |
| inputCol | string | image | image struct ([Image schema](ocr_structures#image-schema)) |

#### Parameters

{:.table-model-big}
| Param name | Type | Default | Description |
| --- | --- | --- | --- |
| scoreThreshold | float | 0.9 | Score threshold for output regions.|
| sizeThreshold | int | 5 | Size threshold for detected text |
| textThreshold | float | 0.4f | Text threshold |
| linkThreshold | float | 0.4f | Link threshold |
| width | integer | 0 | Scale width to this value, if 0 use original width |
| height | integer | 0 | Scale height to this value, if 0 use original height |

#### Output Columns

{:.table-model-big}
| Param name | Type | Default | Column Data Description |
| --- | --- | --- | --- |
| outputCol | string | table_regions | array of [Coordinaties]ocr_structures#coordinate-schema)|


**Example:**

<div class="tabs-box pt0" markdown="1">

{% include programmingLanguageSelectScalaPython.html %}

```scala
import com.johnsnowlabs.ocr.transformers.*
import com.johnsnowlabs.ocr.OcrContext.implicits._

val imagePath = "path to image"

// Read image file as binary file
val df = spark.read
.format("binaryFile")
.load(imagePath)
.asImage("image")

// Define transformer for detect text
val text_detector = ImageTextDetector
.pretrained("text_detection_v1", "en", "clinical/ocr")
.setInputCol("image")
.setOutputCol("text_regions")

val draw_regions = new ImageTextDetector()
.setInputCol("image")
.setInputRegionsCol("text_regions")
.setOutputCol("image_with_regions")
.setSizeThreshold(10)
.setScoreThreshold(0.9)
.setLinkThreshold(0.4)
.setTextThreshold(0.2)
.setWidth(1512)
.setHeight(2016)


pipeline = PipelineModel(stages=[
binary_to_image,
text_detector,
draw_regions
])

val data = pipeline.transform(df)

data.storeImage("image_with_regions")
```

```python
from pyspark.ml import PipelineModel
from sparkocr.transformers import *

imagePath = "path to image"

# Read image file as binary file
df = spark.read
.format("binaryFile")
.load(imagePath)

binary_to_image = BinaryToImage() \
.setInputCol("content") \
.setOutputCol("image")

# Define transformer for detect text
text_detector = ImageTextDetector \
.pretrained("text_detection_v1", "en", "clinical/ocr") \
.setInputCol("image") \
.setOutputCol("text_regions") \
.setSizeThreshold(10) \
.setScoreThreshold(0.9) \
.setLinkThreshold(0.4) \
.setTextThreshold(0.2) \
.setWidth(1512) \
.setHeight(2016)

draw_regions = ImageDrawRegions() \
.setInputCol("image") \
.setInputRegionsCol("text_regions") \
.setOutputCol("image_with_regions")


pipeline = PipelineModel(stages=[
binary_to_image,
text_detector,
draw_regions
])

data = pipeline.transform(df)

display_images(data, "image_with_regions")
```

</div>

**Output:**

![image](/assets/images/ocr/text_detection.png)
2 changes: 2 additions & 0 deletions docs/en/ocr_pipeline_components.md
Original file line number Diff line number Diff line change
Expand Up @@ -2509,6 +2509,7 @@ data.show()
| Param name | Type | Default | Description |
| --- | --- | --- | --- |
| explodeCols | Array[string] | |Columns which need to explode |
| rotated | boolean | False | Support rotated regions |

#### Output Columns

Expand Down Expand Up @@ -2741,6 +2742,7 @@ result = pipeline.transform(df)
| --- | --- | --- | --- |
| lineWidth | Int | 4 | Line width for draw rectangles |
| fontSize | Int | 12 | Font size for render labels and score |
| rotated | boolean | False | Support rotated regions |

#### Output Columns

Expand Down
33 changes: 33 additions & 0 deletions docs/en/ocr_release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,38 @@ sidebar:
nav: spark-ocr
---


## 3.10.0

Release date: 10-01-2022


#### Overview

Form recognition using LayoutLMv2 and text detection.


#### New Features

* Added [VisualDocumentNERv2](ocr_visual_document_understanding#visualdocumentnerv2) transformer
* Added DL based [ImageTextDetector](ocr_object_detection#imagetextdetector) transformer
* Support rotated regions in [ImageSplitRegions](ocr_pipeline_components#imagesplitregions)
* Support rotated regions in [ImageDrawRegions](ocr_pipeline_components#imagedrawregions)


#### New Models

* LayoutLMv2 fine-tuned on FUNSD dataset
* Text detection model based on CRAFT architecture


#### New notebooks

* [Text Detection](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/3100-release-candidate/jupyter/TextDetection/SparkOcrImageTextDetection.ipynb)
* [Visual Document NER v2](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/3100-release-candidate/jupyter/SparkOCRVisualDocumentNERv2.ipynb)



## 3.9.1

Release date: 02-11-2021
Expand All @@ -28,6 +60,7 @@ Added preserving of original file formatting
* [Preserve Original Formatting](https://github.com/JohnSnowLabs/spark-ocr-workshop/blob/3.9.1/jupyter/SparkOcrPreserveOriginalFormatting.ipynb)



## 3.9.0

Release date: 20-10-2021
Expand Down
141 changes: 139 additions & 2 deletions docs/en/ocr_visual_document_understanding.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ document_ner = VisualDocumentNer() \
pipeline = PipelineModel(stages=[
binary_to_image,
ocr,
document_ner,
document_ner,
])

result = pipeline.transform(df)
Expand All @@ -262,4 +262,141 @@ Output:
| B-COMPANY, [word -> AEON, token -> aeon], []], [entity, 0, 0, B-COMPANY,|
| [word -> CO., token -> co], ... |
+-------------------------------------------------------------------------+
```
```

## VisualDocumentNERv2

`VisualDocumentNERv2` is a DL model for NER documents which is an improved version of `VisualDocumentNER`. There is available pretrained model trained on FUNSD dataset.

#### Input Columns

{:.table-model-big}
| Param name | Type | Default | Column Data Description |
| --- | --- | --- | --- |
| inputCols | Array[String] | | Сolumn names for tokens of the document and image|


#### Parameters

{:.table-model-big}
| Param name | Type | Default | Description |
| --- | --- | --- | --- |
| maxSentenceLength | int | 512 | Maximum sentence length. |
| whiteList | Array[String] | | Whitelist of output labels |

#### Output Columns

{:.table-model-big}
| Param name | Type | Default | Column Data Description |
| --- | --- | --- | --- |
| outputCol | string | entities | Name of output column with entities Annotation. |


**Example:**


<div class="tabs-box pt0" markdown="1">

{% include programmingLanguageSelectScalaPython.html %}

```scala
import com.johnsnowlabs.ocr.transformers.*
import com.johnsnowlabs.ocr.OcrContext.implicits._

val imagePath = "path to image"

var dataFrame = spark.read.format("binaryFile").load(imagePath)

var bin2imTransformer = new BinaryToImage()
bin2imTransformer.setImageType(ImageType.TYPE_3BYTE_BGR)

val ocr = new ImageToHocr()
.setInputCol("image")
.setOutputCol("hocr")
.setIgnoreResolution(false)
.setOcrParams(Array("preserve_interword_spaces=0"))

val tokenizer = new HocrTokenizer()
.setInputCol("hocr")
.setOutputCol("token")

val visualDocumentNER = VisualDocumentNERv2
.pretrained("layoutlmv2_funsd", "en", "clinical/ocr")
.setInputCols(Array("token", "image"))

val pipeline = new Pipeline()
.setStages(Array(
bin2imTransformer,
ocr,
tokenizer,
visualDocumentNER
))

val results = pipeline
.fit(dataFrame)
.transform(dataFrame)
.select("entities")
.cache()

result.select("entities").show()
```

```python
from pyspark.ml import PipelineModel
from sparkocr.transformers import *

imagePath = "path to image"

# Read image file as binary file
df = spark.read
.format("binaryFile")
.load(imagePath)

binToImage = BinaryToImage() \
.setInputCol("content") \
.setOutputCol("image")

ocr = ImageToHocr()\
.setInputCol("image")\
.setOutputCol("hocr")\
.setIgnoreResolution(False)\
.setOcrParams(["preserve_interword_spaces=0"])

tokenizer = HocrTokenizer()\
.setInputCol("hocr")\
.setOutputCol("token")

ner = VisualDocumentNerV2()\
.pretrained("layoutlmv2_funsd", "en", "clinical/ocr")\
.setInputCols(["token", "image"])\
.setOutputCol("entities")

pipeline = PipelineModel(stages=[
binToImage,
ocr,
tokenizer,
ner
])

result = pipeline.transform(df)
result.withColumn('filename', path\_array.getItem(f.size(path_array)- 1)) \
.withColumn("exploded_entities", f.explode("entities")) \
.select("filename", "exploded_entities") \
.show(truncate=False)
```

</div>

Output sample:

```
+---------+-------------------------------------------------------------------------------------------------------------------------+
|filename |exploded_entities |
+---------+-------------------------------------------------------------------------------------------------------------------------+
|form1.jpg|[entity, 0, 6, i-answer, [x -> 1027, y -> 89, height -> 19, confidence -> 96, word -> Version:, width -> 90], []] |
|form1.jpg|[entity, 25, 35, b-header, [x -> 407, y -> 190, height -> 37, confidence -> 96, word -> Institution, width -> 241], []] |
|form1.jpg|[entity, 37, 40, i-header, [x -> 667, y -> 190, height -> 37, confidence -> 96, word -> Name, width -> 130], []] |
|form1.jpg|[entity, 42, 52, b-question, [x -> 498, y -> 276, height -> 19, confidence -> 96, word -> Institution, width -> 113], []]|
|form1.jpg|[entity, 54, 60, i-question, [x -> 618, y -> 276, height -> 19, confidence -> 96, word -> Address, width -> 89], []] |
+---------+-------------------------------------------------------------------------------------------------------------------------+
```