Skip to content

Conversation

@danilojsl
Copy link
Contributor

@danilojsl danilojsl commented May 29, 2025

Description

This PR introduces Semantic Chunking, which enhances the Partition and PartitionTransformer components by dividing content into meaningful units based on the document's structure and content.

The first strategy is the basic one that is limited based on the number of characters

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@danilojsl danilojsl self-assigned this May 29, 2025
@danilojsl danilojsl merged commit 5ae05b9 into feature/SPARKNLP-1125-Implement-Chunking-Strategies May 29, 2025
@DevinTDHa DevinTDHa mentioned this pull request Jun 10, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants