GSoC - add Frida dynamic analysis for Android #2712

xukunzh · 2025-08-29T00:48:10Z

This PR is a result of @xukunzh 's Google Summer of Code (GSoC) 2025 project that integrates capa with Frida for Android dynamic analysis.

This project provides a complete automation framework: automatically generating Frida monitoring scripts from API configuration JSON file, executing dynamic analysis on Android devices, outputting API call data in JSONL format, and then converting this behavioral data into features through FridaExtractor for malware capability detection.

A summary of this project: GSoC report.

Add Frida log to capa analysis workflow

Accidentally merged unreviewed commit, reverting. This reverts commit 8ed3cd1.

Co-authored-by: Mike Hunhoff <[email protected]>

Implement basic Frida JSONL output and parser

Integrate FridaExtractor into capa and add arguments

Auto-generate Frida hooks from APIs JSON file

Co-authored-by: Mike Hunhoff <[email protected]>

add Java native & static method support and update model with Pydantic

Add complete script generation and update README

Add APK hashes extraction and Add automation workflow

google-cla · 2025-08-29T00:48:14Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

github-actions

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

gemini-code-assist

Summary of Changes

Hello @xukunzh, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands capa's analytical capabilities by adding support for dynamic analysis of Android applications via Frida. It provides a robust system to ingest and interpret runtime behavioral data, allowing for the detection of capabilities that are only observable during execution. The changes streamline the process of setting up an analysis environment, capturing API interactions, and integrating this rich dynamic information into capa's existing feature extraction and rule matching engine.

Highlights

Frida Dynamic Analysis Integration: Introduced a comprehensive framework for integrating Frida dynamic analysis reports into capa. This enables capa to analyze behavioral data from Android applications, complementing static analysis.
New Frida Extractor and Data Models: Added a new FridaExtractor to parse JSONL reports generated by Frida, extracting features such as OS, architecture, package name, and detailed API calls with arguments. New Pydantic models (FridaReport, Call, Process, etc.) are defined to structure this dynamic data.
Automated Frida Analysis Workflow: Provided a suite of Python scripts (scripts/frida/) to automate the entire dynamic analysis process. This includes tools for Android emulator creation and setup, APK metadata extraction, dynamic Frida script generation using Jinja2 templates, and orchestration of Frida execution and result retrieval.
Core Capa Integration and Dependencies: Updated core capa components (capa/features/common.py, capa/helpers.py, capa/loader.py, capa/main.py) to recognize and process Frida reports as a new input format and backend. New Python dependencies (frida, jinja2) have been added to pyproject.toml to support these capabilities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant new functionality by adding support for Frida-based dynamic analysis of Android applications. It includes a new Frida extractor in capa, along with a suite of scripts for generating Frida trace data. The changes are extensive and well-structured. I've identified a couple of areas for improvement related to performance and usability in the new scripts and models.

gemini-code-assist · 2025-08-29T00:50:14Z

capa/features/extractors/frida/models.py

+        with open(jsonl_path, "r") as f:
+            content = f.read()
+            for line in content.splitlines():
+                record = json.loads(line)
+
+                if "metadata" in record:
+                    metadata = Metadata(**record["metadata"])
+                elif "api" in record:
+                    if "java_api" in record["api"]:
+                        call = Call(**record["api"]["java_api"])
+                        api_calls.append(call)
+                    elif "native_api" in record["api"]:
+                        call = Call(**record["api"]["native_api"])
+                        api_calls.append(call)


Reading the entire file into memory with f.read() can be inefficient for large JSONL files. It's better to iterate over the file line by line to reduce memory consumption. This change also adds encoding='utf-8' for robustness and handles empty or malformed JSON lines.

with open(jsonl_path, "r", encoding="utf-8") as f: for line in f: line = line.strip() if not line: continue try: record = json.loads(line) except json.JSONDecodeError: continue if "metadata" in record: metadata = Metadata(**record["metadata"]) elif "api" in record: if "java_api" in record["api"]: call = Call(**record["api"]["java_api"]) api_calls.append(call) elif "native_api" in record["api"]: call = Call(**record["api"]["native_api"]) api_calls.append(call)

gemini-code-assist · 2025-08-29T00:50:14Z

scripts/frida/main.py

+        if not has_connected_device():
+            logger.info("Found no devices. Make sure emulator is running")
+            response = input("Auto-create an emulator? (y/n): ")
+            if response == "y":


The check for user input y is case-sensitive. It's better to convert the input to lowercase to handle both 'y' and 'Y', which provides a better user experience.

Suggested change

if response == "y":

if response.lower() == "y":

CHANGELOG updated or no update needed, thanks! 😄

xukunzh and others added 30 commits May 29, 2025 15:05

add basic Android dynamic extractor framework

cc06df8

Add Frida log to capa analysis workflow

5415459

Implement basic Frida JSONL output and parser

8ed3cd1

Merge pull request #2 from xukunzh/FridaExtractor

afe17ed

Add Frida log to capa analysis workflow

Implement basic Frida JSONL output and parser

3c3fce1

Revert "Implement basic Frida JSONL output and parser"

31fad02

Accidentally merged unreviewed commit, reverting. This reverts commit 8ed3cd1.

Add FROMAT_ANDROID

fda4892

Merge Mike's commit suggestion

c843822

Co-authored-by: Mike Hunhoff <[email protected]>

Merge Mike's commit suggestion

b03c7bd

Co-authored-by: Mike Hunhoff <[email protected]>

Integrate FridaExtractor into Capa

53d75ff

Use Pydantic models to validate these JSON blobs

4c681df

Add arguments handling

7886446

Change to FORMAT_APK in common.py

28d28f9

Change to FORMAT_APK in extractor.py

24fe942

Fix a AttributeError bug

3c1bae7

Merge pull request #3 from xukunzh/FridaExtractor

20839a0

Implement basic Frida JSONL output and parser

Squash fix commits into one

a1b8b11

Merge branch 'master' into FridaExtractor

191bf03

Update the value type in Argument model

c28d1e2

Merge pull request #4 from xukunzh/FridaExtractor

25bd5c0

Integrate FridaExtractor into capa and add arguments

Auto-generate Frida hooks from Capa rules

25696a9

Switch to use APIs JSON file

98391f3

Merge pull request #5 from xukunzh/FridaExtractor

1da2435

Auto-generate Frida hooks from APIs JSON file

add Java native & static method support and update model with Pydantic

63304d2

Update scripts/frida/hook_builder.py

4a015df

Co-authored-by: Mike Hunhoff <[email protected]>

Merge pull request #6 from xukunzh/FridaExtractor

14a60a2

add Java native & static method support and update model with Pydantic

Add native API hooking support

0752417

Merge branch 'master' into FridaExtractor

601037c

Add missing changes from last PR

5aeb03f

Apply pre-commit formatting to existing code

346b0e3

xukunzh and others added 22 commits August 6, 2025 11:15

Merge pull request #8 from xukunzh/FridaExtractor

8a3dab7

Add complete script generation and update README

Add APK hashes support to Frida extractor

69e2179

Require all fields in models

14a2ab1

Changed to get package_name from input

f509479

Automate Frida analysis workflow with frida-compile

94cf914

Change to generate .ts script for now in manual workflow

e88b4f7

Update error handling

a5e144f

Fix a format issue with pre-commit and update gitignore

69286a9

Delete previous JS main templete

b70d798

Add APK installation process

e4ff271

Add auto emulator creation

0211407

Fix create_emulator

4e0f64c

Fix directory creation issue

d6438ab

Fix root access detection issue

47a4dae

Merge pull request #9 from xukunzh/FridaExtractor

edbf385

Add APK hashes extraction and Add automation workflow

Change to exception raising and Replace print with logging

3b4ac24

Update frida_api.json with lastest API list

0edbdc2

Simplify and fix emulator setup details in setup.md

5db7cf6

Make package name optional and add aapt APK extraction

09915d3

Reorganize README and setup.md

6bd19e5

Keep only automated setup in README

a38c397

Update capa and dependencies installation instructions

cc40e39

github-actions bot previously requested changes Aug 29, 2025

View reviewed changes

gemini-code-assist bot reviewed Aug 29, 2025

View reviewed changes

Update CHANGELOG

4de0aa7

xukunzh force-pushed the master branch from 749df53 to 4de0aa7 Compare August 29, 2025 20:56

Merge branch 'frida-gsoc' into master

ac50435

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GSoC - add Frida dynamic analysis for Android #2712

GSoC - add Frida dynamic analysis for Android #2712

Uh oh!

xukunzh commented Aug 29, 2025 •

edited

Loading

Uh oh!

google-cla bot commented Aug 29, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 29, 2025

Uh oh!

gemini-code-assist bot Aug 29, 2025

Uh oh!

Uh oh!

GSoC - add Frida dynamic analysis for Android #2712

Are you sure you want to change the base?

GSoC - add Frida dynamic analysis for Android #2712

Uh oh!

Conversation

xukunzh commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla bot commented Aug 29, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xukunzh commented Aug 29, 2025 •

edited

Loading