A lightweight Flask-based API that accepts the files bellow, extracts the text, and returns it in a clean JSON format.
.doc(Word 97-2003).docx.odt.rtf.odt.pdf(return a fixed and smaller PDF to get text elsewhere)
Includes Swagger UI documentation for ease of testing and integration.
- Upload
.docfile via POST request - Extracts and beautifies plain text from the document
- Returns structured JSON
- Swagger UI (
/apidocs) for testing and documentation - Fully containerized with Docker
- Easy to extend for HTML/Markdown output or other formats
| Tech | Purpose |
|---|---|
| Python 3.11 | Backend language |
| Flask | Web framework |
| Flasgger | Swagger/OpenAPI documentation for Flask |
| antiword | CLI tool to extract text from .doc files |
| Docker | Containerization |
Clone the repository and build the image:
git clone https://github.com/thyarles/os-tools-api.git
cd os-tools-api
make all
make run (it'll call docker and run on port 5000)Now the API is available at: http://localhost:5000
Form Field: file — must be a .doc file (old Word format)
curl -X POST http://localhost:5000/doc \
-F "file=@/path/to/your/file.doc"{
"text": "This is the extracted text from the document."
}Once running, go to:
You’ll see an interactive Swagger interface to test and explore the API.
doc-text-api/
│
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── Dockerfile # Container setup
└── README.md # You're here!
-
Install
antiword:sudo apt install antiword # Debian/Ubuntu -
Set up a Python virtual environment:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Run the app:
python app.py
Swagger uses in-code Python docstrings via flasgger. If you want to customize the Swagger UI or add global info:
In app.py:
swagger = Swagger(app, template={
"info": {
"title": "DOC Text Extractor API",
"description": "API to extract text from .doc (Word 97-2003) files using antiword.",
"version": "1.0.0"
}
})- No authentication is enabled by default.
- Consider wrapping this API behind a gateway or firewall in production environments.
MIT License — free for personal and commercial use.
Pull requests are welcome! For major changes, open an issue first to discuss what you’d like to change or add.
Right now, it returns clean plain text. Let us know if you want to support formatting or downloadable outputs!