ASTRA - Authorization with Semantic Task-based Restricted Access

You can find out more on the overview page here, or read the full paper here.

This repository contains an an open-source dataset for task-tool matching in the context of delegated authorization flows, as described in our paper. The core data resides in the data/ directory, which is divided by task complexity: 01_tool, 02_tools, and 03_tools contain datasets for tasks requiring one, two, or three tools, respectively. Each of these directories is further split into ASTRA (our generated data) and TOUCAN (processed TOUCAN data), with files for generated tasks, validation, and test splits, or processed tasks and test data respectively. The mcp_servers/ folder holds the MCP Server configuration files used in data generation, separated for ASTRA and TOUCAN sources and containing JSON files for each server.

Key Features

Synthetic Multi-Tool Tasks: Agentic tasks are generated using real-world MCP Servers (e.g., Wikipedia, GitHub) with sets of $N$ tools ($N \in [1, 2, 3]$), ensuring semantic coherence and realism.
Simulated Tool Matching: Includes both correct and simulated incorrect tool matches:
- Wrong matches: Tools from the same MCP Server
- Null matches: Tools from different MCP Servers
TOUCAN Data Integration: Curated and pre-processed subset of the TOUCAN dataset for direct comparison, with consistent formatting and quality controls.
Comprehensive Metadata: All tool names, descriptions, and server metadata are included.

Data Overview

Enterprise MCP Servers: 12 high-quality, English-only servers, covering a range of 10-90 tools each.
Synthetic Tasks: $352 \times 3$ tasks per $N \in [1, 2, 3]$ for our dataset; 1,056 processed tasks per $N$ for Toucan.
Validation Ready: Processed, de-duplicated, and filtered for high data quality.

Citation

If you use this dataset then please cite our paper.

Repository Structure

ASTRA/
├── README.md                         # Project documentation
├── LICENSE                           # License information
├── data/                             # Dataset
│   │
│   ├── 01_tool/                      # Data for single tool tasks
│   │   ├── ASTRA/                    # ASTRA-generated data
│   │   │   ├── generated.json        # Generated tasks for MCP Server tools
│   │   │   ├── test.json             # Test data split
│   │   │   └── validation.json       # Validation data split
│   │   └── TOUCAN/                   # TOUCAN-processed data
│   │       ├── processed.json        # Processed tasks for MCP Server tools
│   │       └── test.json             # Test data
│   │
│   ├── 02_tools/                     # Data for tasks with two tools...
│   │   ├── ASTRA/                    # ...following the same structure as above
│   │   │   ├── generated.json
│   │   │   ├── test.json
│   │   │   └── validation.json
│   │   └── TOUCAN/
│   │       ├── processed.json
│   │       └── test.json
│   │
│   ├── 03_tools/                     # Data for tasks with three tools...
│   │   ├── ASTRA/                    # ...following the same structure as above
│   │   │   ├── generated.json
│   │   │   ├── test.json
│   │   │   └── validation.json
│   │   └── TOUCAN/
│   │       ├── processed.json
│   │       └── test.json
│   │
│   └── mcp_servers/                  # MCP Server configurations
│       ├── ASTRA/                    # ASTRA MCP Server configs
│       │   ├── atlassian.json
│       │   ├── azure.json
│       │   └── ... (additional servers)
│       └── TOUCAN/                   # TOUCAN MCP Server configs
│           ├── After Effects MCP Server.json
│           ├── AI Research Assistant - Semantic Scholar.json
│           └── ... (additional servers)

Roadmap

See open issues for a list of proposed features and known issues.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. For detailed contributing guidelines, please see CONTRIBUTING.md.

Copyright Notice

Copyright Notice and License

Distributed under Apache 2.0 License. See LICENSE for more information.

Copyright Cisco Systems, Inc. and its affiliates.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
data		data
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ASTRA - Authorization with Semantic Task-based Restricted Access

Key Features

Data Overview

Citation

Repository Structure

Roadmap

Contributing

Copyright Notice

About

Uh oh!

Releases

Packages

Contributors 5

License

outshift-open/ASTRA

Folders and files

Latest commit

History

Repository files navigation

ASTRA - Authorization with Semantic Task-based Restricted Access

Key Features

Data Overview

Citation

Repository Structure

Roadmap

Contributing

Copyright Notice

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages