Fix TaskEvaluation validation errors for missing quality and dict suggestions #3916

devin-ai-integration · 2025-11-14T12:42:54Z

Fix TaskEvaluation validation errors for missing quality and dict suggestions

Summary

Fixes #3915 by making the TaskEvaluation Pydantic model more resilient to malformed LLM output. The issue reported validation errors when:

LLM omits the quality field entirely
LLM returns suggestions as [{'point': '...', 'priority': 'high'}] instead of list[str]
LLM uses score instead of quality as the field name

Changes:

Made quality, suggestions, and entities fields optional with sensible defaults
Added ConfigDict(extra="ignore") to ignore unexpected fields like relationships
Added @model_validator to map score → quality when quality is missing
Added @field_validator for suggestions to extract point values from dict entries
Added @field_validator for quality to coerce int/string to float
Created 16 comprehensive unit tests covering all edge cases from the issue

Backward compatibility: LongTermMemoryItem already accepts quality=None, and the strict crew evaluation path uses a separate TaskEvaluationPydanticOutput model that remains unchanged.

Review & Testing Checklist for Human

Critical: Search codebase for .quality usages - Verify no code assumes TaskEvaluation.quality is always non-None (I checked the main ones but may have missed some)
Test with real LLM output - Run a crew with memory=True and verify the fix works end-to-end with actual LLM responses (my tests are unit tests only)
Verify suggestions normalization - Check that the normalize_suggestions validator correctly handles all dict formats you've seen in production logs (especially the point/priority structure)
Review uv.lock changes - I had to regenerate the lock file due to corruption; verify CI passes and no dependency issues arise

Test Plan

Create a crew with memory=True and external_memory=ExternalMemory(...)
Run tasks and monitor logs for the "Failed to parse structured output" error
Verify long-term memory saves succeed without validation errors
Check that quality scores are properly recorded (or None when missing)

Notes

Pre-existing mypy errors in the file (lines 156, 185, 198) are unrelated to this PR
All 18 tests pass (16 new + 2 existing)
Ruff linter passes

Devin session: https://app.devin.ai/sessions/8dc1309c760a4898bc9d347c1af9f702
Requested by: João ([email protected])

…gestions Fixes #3915 This commit addresses Pydantic validation errors that occur when the LLM output doesn't match the expected TaskEvaluation schema: 1. Missing 'quality' field - LLM sometimes omits this field 2. 'suggestions' as list of dicts - LLM returns [{'point': '...', 'priority': 'high'}] instead of list[str] 3. 'score' field instead of 'quality' - LLM uses 'score' as alternate field name Changes: - Make 'quality' field optional (float | None) with default None - Make 'suggestions' field optional with default empty list - Make 'entities' field optional with default empty list - Add ConfigDict(extra='ignore') to ignore unexpected fields - Add model_validator to map 'score' to 'quality' when quality is missing - Add field_validator for 'suggestions' to normalize dict format to list[str] - Extracts 'point' value from dicts with 'point' key - Handles single dict, single string, list of mixed types, and None - Add field_validator for 'quality' to coerce int/str to float The fix is backward compatible - LongTermMemoryItem already accepts quality=None, and the strict crew evaluation path uses a separate TaskEvaluationPydanticOutput model that remains unchanged. Tests: - Added 16 comprehensive unit tests covering all edge cases - All existing tests continue to pass - Tests replicate exact error scenarios from issue #3915 Co-Authored-By: João <[email protected]>

devin-ai-integration · 2025-11-14T12:42:57Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Co-Authored-By: João <[email protected]>

Trigger CI re-run

76318c6

Co-Authored-By: João <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix TaskEvaluation validation errors for missing quality and dict suggestions #3916

Fix TaskEvaluation validation errors for missing quality and dict suggestions #3916

Uh oh!

devin-ai-integration bot commented Nov 14, 2025

Uh oh!

devin-ai-integration bot commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix TaskEvaluation validation errors for missing quality and dict suggestions #3916

Are you sure you want to change the base?

Fix TaskEvaluation validation errors for missing quality and dict suggestions #3916

Uh oh!

Conversation

devin-ai-integration bot commented Nov 14, 2025

Fix TaskEvaluation validation errors for missing quality and dict suggestions

Summary

Review & Testing Checklist for Human

Test Plan

Notes

Uh oh!

devin-ai-integration bot commented Nov 14, 2025

🤖 Devin AI Engineer

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant