Alibaba-AAIG
Pinned Loading
Repositories
- Kelp Public
Alibaba-AAIG/Kelp’s past year of commit activity - SNCE Public
This repository contains the code and experimental materials for the paper "A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models"
Alibaba-AAIG/SNCE’s past year of commit activity - Safe-SAIL Public
A framework for interpreting SAE features within LLMs to advance mechanistic understanding in safety domains.
Alibaba-AAIG/Safe-SAIL’s past year of commit activity - Oyster Public
The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster 系列是 Alibaba-AAIG 自研的安全模型,致力于构建负责任的 AI 生态。
Alibaba-AAIG/Oyster’s past year of commit activity - Strata-Sword Public
The Strata-Sword is a hierarchical Chinese-English jailbreak safety benchmark based on quantified reasoning complexity, developed in-house by Alibaba-AAIG | Strata-Sword 是 Alibaba-AAIG自研的中英文分层越狱攻击安全基准,将“推理复杂度”作为可评估的安全维度,并提出多种中文特有攻击方法,以系统评测不同推理复杂度下LLMs和LRMs的安全边界,从而为提升模型安全性提供新思路。
Alibaba-AAIG/Strata-Sword’s past year of commit activity - S-Eval Public Forked from IS2Lab/S-Eval
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
Alibaba-AAIG/S-Eval’s past year of commit activity - easyrobust Public Forked from alibaba/easyrobust
EasyRobust: an Easy-to-use library for state-of-the-art Robust Computer Vision Research with PyTorch.
Alibaba-AAIG/easyrobust’s past year of commit activity - Beyond-ImageNet-Attack Public
Beyond imagenet attack (accepted by ICLR 2022) towards crafting adversarial examples for black-box domains.
Alibaba-AAIG/Beyond-ImageNet-Attack’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…