-
Notifications
You must be signed in to change notification settings - Fork 619
GSoC - add Frida dynamic analysis for Android #2712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: frida-gsoc
Are you sure you want to change the base?
Conversation
Add Frida log to capa analysis workflow
Accidentally merged unreviewed commit, reverting. This reverts commit 8ed3cd1.
Co-authored-by: Mike Hunhoff <[email protected]>
Co-authored-by: Mike Hunhoff <[email protected]>
Implement basic Frida JSONL output and parser
Integrate FridaExtractor into capa and add arguments
Auto-generate Frida hooks from APIs JSON file
Co-authored-by: Mike Hunhoff <[email protected]>
add Java native & static method support and update model with Pydantic
Add complete script generation and update README
Add APK hashes extraction and Add automation workflow
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased)
section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @xukunzh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly expands capa's analytical capabilities by adding support for dynamic analysis of Android applications via Frida. It provides a robust system to ingest and interpret runtime behavioral data, allowing for the detection of capabilities that are only observable during execution. The changes streamline the process of setting up an analysis environment, capturing API interactions, and integrating this rich dynamic information into capa's existing feature extraction and rule matching engine.
Highlights
- Frida Dynamic Analysis Integration: Introduced a comprehensive framework for integrating Frida dynamic analysis reports into capa. This enables capa to analyze behavioral data from Android applications, complementing static analysis.
- New Frida Extractor and Data Models: Added a new
FridaExtractor
to parse JSONL reports generated by Frida, extracting features such as OS, architecture, package name, and detailed API calls with arguments. New Pydantic models (FridaReport
,Call
,Process
, etc.) are defined to structure this dynamic data. - Automated Frida Analysis Workflow: Provided a suite of Python scripts (
scripts/frida/
) to automate the entire dynamic analysis process. This includes tools for Android emulator creation and setup, APK metadata extraction, dynamic Frida script generation using Jinja2 templates, and orchestration of Frida execution and result retrieval. - Core Capa Integration and Dependencies: Updated core capa components (
capa/features/common.py
,capa/helpers.py
,capa/loader.py
,capa/main.py
) to recognize and process Frida reports as a new input format and backend. New Python dependencies (frida
,jinja2
) have been added topyproject.toml
to support these capabilities.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant new functionality by adding support for Frida-based dynamic analysis of Android applications. It includes a new Frida extractor in capa
, along with a suite of scripts for generating Frida trace data. The changes are extensive and well-structured. I've identified a couple of areas for improvement related to performance and usability in the new scripts and models.
with open(jsonl_path, "r") as f: | ||
content = f.read() | ||
for line in content.splitlines(): | ||
record = json.loads(line) | ||
|
||
if "metadata" in record: | ||
metadata = Metadata(**record["metadata"]) | ||
elif "api" in record: | ||
if "java_api" in record["api"]: | ||
call = Call(**record["api"]["java_api"]) | ||
api_calls.append(call) | ||
elif "native_api" in record["api"]: | ||
call = Call(**record["api"]["native_api"]) | ||
api_calls.append(call) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the entire file into memory with f.read()
can be inefficient for large JSONL files. It's better to iterate over the file line by line to reduce memory consumption. This change also adds encoding='utf-8'
for robustness and handles empty or malformed JSON lines.
with open(jsonl_path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
except json.JSONDecodeError:
continue
if "metadata" in record:
metadata = Metadata(**record["metadata"])
elif "api" in record:
if "java_api" in record["api"]:
call = Call(**record["api"]["java_api"])
api_calls.append(call)
elif "native_api" in record["api"]:
call = Call(**record["api"]["native_api"])
api_calls.append(call)
if not has_connected_device(): | ||
logger.info("Found no devices. Make sure emulator is running") | ||
response = input("Auto-create an emulator? (y/n): ") | ||
if response == "y": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHANGELOG updated or no update needed, thanks! 😄
This PR is a result of @xukunzh 's Google Summer of Code (GSoC) 2025 project that integrates capa with Frida for Android dynamic analysis.
This project provides a complete automation framework: automatically generating Frida monitoring scripts from API configuration JSON file, executing dynamic analysis on Android devices, outputting API call data in JSONL format, and then converting this behavioral data into features through FridaExtractor for malware capability detection.
A summary of this project: GSoC report.