Skip to content

Conversation

xusheng6
Copy link
Contributor

@xusheng6 xusheng6 commented Dec 2, 2024

This is part of the effort to optimize the binja extractor performance (#1414). As I mentioned in #2509 (comment), one of the outstanding issue that drags down the binja extractor performance is the re-generation of the IL functions during feature extraction. Those are, however, necessary to ensure the most accurate results.

That said, I found that we can actually just retrieve a single LLIL instruction instead of requesting the entire IL from the function. Getting an LLIL instruction is extremely fast compared to getting the entire IL function and then take the particular instruction from it.

Here is what I am getting as a difference:

  1. For a small file (321338196a46b600ea330fc5d98d0699.exe_, 486 KB in size), the feature extracting time (excluding the initial analysis time) is down from 45 seconds to 32 seconds
  2. For a large file (2f7f5fb5de175e770d7eae87666f9831.elf_, 4.1 MB in size), the feature extracting time is down from 15 minutes to 5 minutes. Which is 300% performance improvement! Apparently the IL regeneration issue becomes more severe as the file grows bigger.

With the change, the extractor is strictly accessing the functions in a sequential order and they never request the IL of a different function, so there is no regeneration of the IL. I also tested the MLIL basic block things, and it makes no noticeable performance gain even if I completely disable the stack string check. In this sense, there is not much motivation to chase after that part.

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@github-actions github-actions bot dismissed their stale review December 2, 2024 09:07

CHANGELOG updated or no update needed, thanks! 😄

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the improvements look good to me.

@mr-tz mr-tz merged commit abe8084 into mandiant:master Dec 2, 2024
28 checks passed
@xusheng6 xusheng6 deleted the fix_llil_access branch December 2, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants