Skip to content

Conversation

@michael-zhao459
Copy link
Contributor

@michael-zhao459 michael-zhao459 commented Jul 28, 2025

What does this PR do?

This PR adds lambda support for Data Streams Monitoring (DSM), sets a checkpoint after extracting context from trace propagation headers, DSM context gets packaged in with the trace prop headers. Queues supported by DSM include SQS, SNS, SNS->SQS, and Kinesis

Motivation

Consume calls of lambdas are set as triggers to queue and do not have any explicit calls the tracers can hook into to set a checkpoint

Testing Guidelines

Tried all configurations with sandbox AWS account. Wrote unit tests ensuring all cases were covered.

Additional Notes

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

…t of the loop of records earlier when data streams is not enabled
@ericfirth ericfirth requested a review from purple4reina August 21, 2025 16:36
src/index.ts Outdated
*/
export function getDataStreamsEnabled(): boolean {
return getConfig().dataStreamsEnabled;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what we want. Calling getConfig will read each and every environment variable each and every time it is called.

Instead, you will want to use the global config that is created just once at start up.

Pass this.config to the constructor of the trace extractor

if (EventValidator.isSNSEvent(event)) return new SNSEventTraceExtractor(this.tracerWrapper);

then you can use it during extraction without having to read environment variables each time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clarifying. i am not super-familiar with this repo. made this change as well

// If we already have a context and dsm is not enabled, we can break out of the loop early
if (!getDataStreamsEnabled() && context) {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the behavior of the extractor and in the case where there is no trace propagation context, will loop through each and every record attempting to look for one. That will potentially add significant overhead to the function execution time.

Instead, after the context has been extracted (or not), we should break if dsm is not enabled, whether the context exists or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. i changed the extractors to not change anything about the context fetching from main, but add the loop on records only when dsm is enabled

Copy link
Contributor

@purple4reina purple4reina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two requested changes:

  1. Read from global config object
  2. Always break from loop when dsm not enabled

@ericfirth ericfirth force-pushed the michael.zhao/lambda-dsm branch from c94ecbb to adce22d Compare August 22, 2025 18:12
@ericfirth ericfirth force-pushed the michael.zhao/lambda-dsm branch from adce22d to e00c8e1 Compare August 22, 2025 18:23

if (!this.config.dataStreamsEnabled) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this before the arn check, that way when dsm is disabled, users will never see dsm related messages in their logs.

src/index.ts Outdated
};
const tracerWrapper = new TracerWrapper();
const config = getConfig();
const tracerWrapper = new TracerWrapper(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why is this change necessary? This method is only called when there is an error attempting to initialize the handler.

coldStartTraceSkipLib: "",
addSpanPointers: true,
dataStreamsEnabled: false,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to have all of this testing data in the code. Is there not a way we can move it to the tests? We don't want to have to maintain two separate locations for managing default configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we don't need to add the config to this class in the first place. We should just rely on the config on the extractor alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 true. I think I was being too deferential to the change Michael made where there was a check in the tracer wrapper because set consume checkpoint was always run. But with the optimizations where we are only setting checkpoints when dsm is enabled, so there is no need for the second check

}
}

headers = headers ? JSON.parse(headers) : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dsm is enabled, we'll now json parse the headers twice. I feel like that can and should be avoided. This method is in the hotpath of people's invocations and needs to be as performant as possible.

}

try {
this.tracer.dataStreamsCheckpointer.setConsumeCheckpoint(eventType, arn, contextJson, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is there a particular version of the tracer that this requires?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meaning the 4th argument (whether its a manually instrumented or not)? that is on 5.62.0, https://github.com/DataDog/dd-trace-js/releases/tag/v5.62.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, you will need to bump the dependency version of ddtrace.

"dd-trace": "^5.58.0",

Comment on lines -18 to -20
// Then try to get from binary value. This happens when SNS->SQS, but SNS has raw message delivery enabled.
// In this case, SNS maps any messageAttributes to the SQS messageAttributes.
// We can at least get trace context from SQS, but we won't be able to create the SNS inferred span.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this comment please?

@ericfirth ericfirth force-pushed the michael.zhao/lambda-dsm branch from 72680f4 to bc3648b Compare August 25, 2025 17:02
expect(traceContext).toBeNull();
});

it("extracts trace context from multiple records when DSM is disabled but does not call setConsumeCheckpoint", () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be wrong now, we only extract from the first record, yeah?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. the test (trace context still works when no dsm) is still relevant but i updated the description to be accurate

@ericfirth ericfirth force-pushed the michael.zhao/lambda-dsm branch from e46b50e to e861214 Compare August 26, 2025 13:06
@ericfirth ericfirth merged commit 5d3da3d into main Aug 26, 2025
26 checks passed
@ericfirth ericfirth deleted the michael.zhao/lambda-dsm branch August 26, 2025 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants