Skip to content

colossyan/TextBPN-MLOCR

Repository files navigation

TextBPN-MLOCR: Advanced Multi-Lingual Scene Text Detection

License HuggingFace Model PyPI Version

Enhanced version of TextBPN++ for robust scene text detection across multiple languages and artistic fonts. Trained on large-scale synthetic and real-world text datasets for superior performance in diverse scenarios.

✨ Key Features

  • Multi-Lingual Support: Detect text in Arabic, Bangla, Chinese, Japanese, Korean, Latin, Hindi
  • Artistic Text Handling: Accurately processes stylized and decorative fonts
  • Optimized Performance: Fully supports modern NVIDIA GPUs
  • Large-scale Training:
    • 🧪 1.5M+ synthetic text samples
    • 📸 500K+ real-world text samples

🛠️ Hardware Requirements

Component Requirement
GPU NVIDIA GPUs
CUDA 12.2
Python ≥ 3.9
OS Linux (recommended)

🔽 Model Download

Download pre-trained models from HuggingFace Hub:
https://huggingface.co/somos99/TextBPN-MLOCR

📦 Installation

Install via PyPI:

pip install -r requirements.txt

From DCN with CUDA

sh make.sh

🚀 Quick Start

import datetime
import json
import logging
import torch
from typing import List
import base64
import cv2
import numpy as np
from PIL import Image
from io import BytesIO
from ocr.ocr_detection import FrameOCR


if __name__ == "__main__":
    # model
    torch.cuda.set_device(0)
    model_path = './models/TextBPN_deformable_resnet50_best2.pth'
    detect_model = FrameOCR(model_path, backbone="deformable_resnet50", use_gpu=True, need_layout=True, test_speed=False)
     
    test_img = "test.jpg"
    raw_images = cv2.imread(test_img)
    if len(raw_images.shape) == 2:
        raw_images = cv2.cvtColor(raw_images, cv2.COLOR_GRAY2BGR)

    out_puts = detect_model.detect([raw_images])
    print(out_puts)

🎨Gradio

📖 References

@inproceedings{zhang2021adaptive,
  title={Adaptive boundary proposal network for arbitrary shape text detection},
  author={Zhang, Shi-Xue and Zhu, Xiaobin and Yang, Chun and Wang, Hongfa and Yin, Xu-Cheng},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  pages={1305--1314},
  year={2021}
}

@article{zhang2023arbitrary,
  title={Arbitrary shape text detection via boundary transformer},
  author={Zhang, Shi-Xue and Yang, Chun and Zhu, Xiaobin and Yin, Xu-Cheng},
  journal={IEEE Transactions on Multimedia},
  volume={26},
  pages={1747--1760},
  year={2023},
  publisher={IEEE}
}

⚖️ License

This project is licensed under the MIT License.

🙏 Acknowledgements

This project extends the original work from:

  • TextBPN++: GitHub Repository
  • Contributors to the TextBPN project

Contribute & Support​​

🌟 Star us on GitHub → https://github.com/somos99/TextBPN-MLOCR
🐛 Report issues → https://github.com/somos99/TextBPN-MLOCR/issues
📥 Pull requests welcome!

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published