feat(http): add error handling for exporting #4709

pafi-code · 2025-08-01T14:21:34Z

Description

When using the OTEL http exporters the std.out gets polluted by unhandled error traces. I don't think that this should be the desired result. This happens when the endpoint is not available when exporting telemetry. This can be cause by different events but in general I think it should be better handled.
Therefore I added error handling with more specific and shorted error messages.

Type of change

I'm not sure if the change type is rather fix or feature, as the service was still running.

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Don't provide a collector at the configured endpoint

Does This PR Require a Contrib Repo Change?

Yes. - Link to PR:
No.

Checklist:

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

linux-foundation-easycla · 2025-08-01T14:21:37Z

The committers listed above are authorized under a signed CLA.

✅ login: pafi-code / name: Paul (2b50d49, f581f22)

aabmass · 2025-08-01T16:45:25Z

...xporter-otlp-proto-http/src/opentelemetry/exporter/otlp/proto/http/_log_exporter/__init__.py

+            try:
+                resp = self._export(serialized_data, deadline_sec - time())
+            except Exception as error:


Would you mind opening an issue or point to some existing issue for the behavior you're encountering? We can discuss possible fixes there.

I think there is a bug in _export() on L157, where the code assumes the connection will succeed, and so the retry loop here is never executed. However the raised exception should be ultimately caught here

opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/_shared_internal/__init__.py

Lines 167 to 170 in d327927

except Exception: # pylint: disable=broad-exception-caught

self._logger.exception(

"Exception while exporting %s.", self._exporting

)

If you just want the logs to go away, you can set a filter on the opentelemetry.sdk._shared_internal logger or disable it

Yea so the issue for me was that the logger logged the entire error trace to the std.out, which totally makes sense given the code that you shared.
So my goal was to still get to know when something fails and not just disable the logger, as I feel like the otlp_exporters should handle this internally.

On top of that I would like to generally comment that doing a retry over a try-except in _export() feels kinda odd to me.
Thanks for taking your time to look into this! :)

Do you still think I should open an issue for that generally?

@aabmass, from what I understand, this is basically when the HTTP endpoint is not available and the urllib call is unable to complete the request, and this outputs the full stacktrace of urllib with the connection refused error.

I think an appropriate fix could be in export fix the retry logic under a try/catch, like:

try: resp = self._export(serialized_data, remaining_time) except Exception as exc: is_retryable = True else: is_retryable = _is_retryable(resp)

and use that variable in the if at L183. Maybe needs more adjustments. wdyt @aabmass @DylanRussell ?

@pafi-code if you are open to do that would be good, if not please lmk so I can file a PR, but I believe the correct approach is to fix the bug of the request not getting into the retry loop

PR is updated. Two questions:

Are you fine with me excepting Exceptions in general or should I only handle ConnectionError? I assume ConnectionError is preferable.

Are you fine with the log message in case of time out or max retries?

@pafi-code let's catch requests.exceptions.RequestException, wdyt?

My first thought on this fix was:

try: resp = self._export(serialized_data, remaining_time) if resp.ok: return SpanExportResult.SUCCESS except requests.exceptions.RequestException as exc: reason = str(exc) retryable = True status_code = None else: retryable = _is_retryable(resp) status_code = resp.status_code reason = resp.reason # Compute jittered exponential backoff backoff = (2 ** retry_num) * random.uniform(0.8, 1.2) remaining_time = deadline - time() if ( not retryable or retry_num + 1 == _MAX_RETRYS or backoff > remaining_time or self._shutdown ): if status_code is None: _logger.error("Failed to export span batch: %s", reason) else: _logger.error( "Failed to export span batch code: %s, reason: %s", status_code, reason, ) return SpanExportResult.FAILURE _logger.warning( "Transient error %s encountered while exporting span batch, retrying in %.2fs.", reason, backoff, ) if is_shutdown_in_progress.wait(backoff): _logger.warning("Shutdown in progress, aborting retry.") break return SpanExportResult.FAILURE

ok will do!

well my issue with this is that these log messages are not really meaningful. If we can't export because max retries was reached for example, then the log message should also reflect this. So in your implementation on each retry you would create a log warning. Then on the last try you just log the error with the same reasoning. However the actual reason you failed to export was because you've reached max retries

Does that argumentation make sense to you?

Okay but I've just noticed that my implementation is also not ideal for the log messages... I will try to find a solution here 😄

pafi-code · 2025-10-21T13:42:46Z

Okay so I know that the pipeline is failing and I assume this PR should be rebased, but for me there is still the question if my proposed fix is actually how it should be handled. Would appreciate some more feedback here. Also I've created an issue related to this PR: #4712

emdneto

@pafi-code can we have tests for this?

pafi-code requested a review from a team as a code owner August 1, 2025 14:21

pafi-code force-pushed the feature/add_error_handling_for_otlp_exporters branch from d3d4031 to afb76f2 Compare August 1, 2025 14:45

aabmass reviewed Aug 1, 2025

View reviewed changes

pafi-code mentioned this pull request Aug 5, 2025

Retry loop never executed in OTLP HTTP exporters #4712

Open

xrmx added this to @xrmx's Python PR digest Aug 18, 2025

xrmx moved this to Reviewed PR that needs fixing in @xrmx's Python PR digest Aug 18, 2025

feat(http): add error handling for exporting

f581f22

pafi-code force-pushed the feature/add_error_handling_for_otlp_exporters branch from afb76f2 to f581f22 Compare October 24, 2025 10:02

feat(http_exporter): allow to run retry loop on connection errors

2b50d49

emdneto reviewed Oct 29, 2025

View reviewed changes

	except Exception: # pylint: disable=broad-exception-caught
	self._logger.exception(
	"Exception while exporting %s.", self._exporting
	)

feat(http): add error handling for exporting #4709

Are you sure you want to change the base?

feat(http): add error handling for exporting #4709

Conversation

pafi-code commented Aug 1, 2025

Description

Type of change

How Has This Been Tested?

Does This PR Require a Contrib Repo Change?

Checklist:

Uh oh!

linux-foundation-easycla bot commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aabmass Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

pafi-code Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pafi-code Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

emdneto Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emdneto Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

emdneto Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

pafi-code Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

emdneto Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pafi-code Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

pafi-code Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

pafi-code commented Oct 21, 2025

Uh oh!

emdneto left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linux-foundation-easycla bot commented Aug 1, 2025 •

edited

Loading

pafi-code Aug 3, 2025 •

edited

Loading

emdneto Oct 22, 2025 •

edited

Loading

emdneto Oct 29, 2025 •

edited

Loading