Structify-AI

Overview

Unstructured Data Extraction Tool. Last updated June 16th 2025 - V 0.2 - Streamlit App

This service applies advanced machine learning to extract text, key-value pairs, tables, and structures from documents automatically and accurately. The prototype focuses on converting unstructured data into a structured format suitable for LLMs, turning documents into usable data and shifting the focus to acting on information rather than compiling it. It explores either adopting external tools or building a custom solution tailored to your documents. A key feature is ensuring compatibility with diverse data types and formats for smooth integration within existing systems, both on-premises and in the cloud.

Architecture

The prototype is built around a core processing engine that leverages advanced NLP techniques to maximize the accuracy of information extraction. Machine learning algorithms are incorporated to enhance the tool's adaptability and optimize its performance over time.

Data Ingestion: Handles various document types and formats. Extraction Core: The engine that uses NLP and ML models to extract text, key-value pairs, and structures. This can be configured to use prebuilt models or custom models. Structuring & Output: Formats the extracted, raw information into clean, usable data suitable for LLMs and other downstream systems. Frontend: A Streamlit application provides the user interface for interaction.

Branching model

This project utilizes the Gitflow branching model for a structured development workflow. The main branch reflects the latest stable release. All development happens on the develop branch. When a release is planned, a release/* branch is created from develop for final preparations. Once ready, it is merged into both main and back into develop. Feature development occurs in separate feature/* branches.

Technologies

Python: Core programming language. Streamlit: For the user interface application. Natural Language Processing (NLP): For information extraction. Machine Learning: To enhance adaptability and performance. Can be used with SDKs for integration.

Setup

Clone the repository: Generated bash git clone https://github.com/mfbcat/structify-AI.git Use code with caution. Bash Navigate to the project directory and install dependencies: Generated bash cd structify-AI pip install -r requirements.txt Use code with caution. Bash Run the Streamlit application: Generated bash streamlit run app.py Use code with caution. Bash

Testing

The application includes unit tests to ensure the quality and stability of the extraction engine. Unit Tests: Located in the src/test/java/ or tests/ directories. Focus on testing individual components. To run the tests: Navigate to the directory containing the tests. Run the test suite via the designated test runner (e.g., pytest).

Contributing

Contributions are welcome! Please follow these guidelines: Fork the repository. Create a new branch for your feature or bug fix: git checkout -b feature/my-new-feature Make your changes and commit them with clear and descriptive commit messages. Test your changes thoroughly. Push your branch to your forked repository: git push origin feature/my-new-feature Create a pull request to the develop branch of the original repository.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.streamlit		.streamlit
.vscode		.vscode
components		components
images		images
pages		pages
pdf		pdf
style		style
utils		utils
.deployment		.deployment
Data Extraction Examples.py		Data Extraction Examples.py
README.md		README.md
requirements.txt		requirements.txt
streamlit.sh		streamlit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Structify-AI

Overview

Architecture

Branching model

Technologies

Setup

Testing

Contributing

License

About

Uh oh!

Languages

mfbcat/data-extraction-tool-with-AI-document-intelligence

Folders and files

Latest commit

History

Repository files navigation

Structify-AI

Overview

Architecture

Branching model

Technologies

Setup

Testing

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages