Masala-CHAI: A Large-Scale SPICE Netlist Dataset for Analog Circuits by Harnessing AI

Abstract

Masala-CHAI is a fully automated framework leveraging large language models (LLMs) to generate Simulation Programs with Integrated Circuit Emphasis (SPICE) netlists. It addresses a long-standing challenge in circuit design automation: automating netlist generation for analog circuits. Automating this workflow could accelerate the creation of fine-tuned LLMs for analog circuit design and verification. In this work, we identify key challenges in automated netlist generation and evaluate multimodal capabilities of state-of-the-art LLMs, particularly GPT-4, in addressing them. We propose a three-step workflow to overcome existing limitations: labeling analog circuits, prompt tuning, and netlist verification. This approach enables end-to-end SPICE netlist generation from circuit schematic images, tackling the persistent challenge of accurate netlist generation. We utilize Masala-CHAI to collect a corpus of 7,500 schematics that span varying complexities in 10 textbooks and benchmark various open source and proprietary LLMs. Models fine-tuned on Masala-CHAI when used in LLM-agentic frameworks such as AnalogCoder achieve a notable 46% improvement in Pass@1 scores. We open-source our dataset and code for community-driven development.

For full paper, use this link: https://arxiv.org/abs/2411.14299

File and Folder Description

./hough/ : Folder containing scripts to use Hough Transform for net detection. Due to large size of the files, download the hough/ content from this Google Drive link: https://drive.google.com/file/d/1mTwWWSMsYwhJW-GfKKVm21Lm5lVtMJ1y/view?usp=sharing
./models/ : Folder containing scripts for YOLOv8-based circuit component detection.
./sample-images/ : Folder containing sample images to run the Auto-SPICE netlist generator.
./trained_checkpoints/ : Contains checkpoint file for YOLOv8 model after training.
./utils/ : Supporting scripts for various components of the Auto-SPICE pipeline.
./Dataset/ : Folder containing dataset of the images with schematics.
- This contains images from AMSNet repo as well.
- Arranged across different data_* folder depending upon their sources.
main.py : Main script that runs the entire pipeline.
run.py : Script to be called for generating netlists for sample images.
environment.yml: Requirements file for creating conda environment
visualize.ipynb: Jupyter notebook for visualizing output of Autospice for a given circuit diagram

Steps to Run the Framework

Clone the repository and navigate into the repository:
```
git clone <repository_url>
cd <repository_name>
```
Create a Conda environment:
```
conda env create -f environment.yml
```
Activate the Conda environment:
```
conda activate autospice_env
```
Add sample images: Place your sample images in the ./sample-images/ folder.

Run the pipeline:

python run.py --src ./sample-images/ --tgt ./sample-output --api_key <openai_api_key>
where - 
- `--src` : Directory path to the sample images.
- `--tgt` : Output directory path for the generated netlists.
- `--api_key` : Your OpenAI API key for using GPT-4

Schematic Caption Generation using GPT-4o

Extract and Annotate Schematics:
- Use the utils/extract_page.py script to process textbook PDFs and automatically detect schematic images.
- This script will crop and annotate the images, saving them into separate folders in the same directory as the original PDF.
- The cropped images can be used to run the Masala-CHAI framework.
```
python utils/extract_page.py <path_to_your_pdf> 
```
Notes:
- <path_to_your_pdf> is the full path to the PDF file containing schematics.
- ./annotation_data.json is the json file which contains all information about the annotated pages, bounding boxes, etc.
- ./cropped_images/ is where the cropped circuit diagrams are saved
Generate Captions for Annotated Images:
- Please rename ./annotation_data.json to ./annotation_data_pdfname.json. You will also need an OpenAI API Key for the next step.
- Once the images are annotated, you can run the utils/caption-generator.py script to utilize GPT-4o for generating captions.
- The captions are saved in a folder alongside the annotated images.
```
python utils/caption-generator.py <path_to_your_pdf> 
```
Notes:
- --./descriptions_short_<pdfname>: The path to the folder containing generated captions for all circuit diagrams
- The descriptons can be paired with their corresponding SPICE netlist generated by the framework to fine-tune LLMs

Descriptions of outputs obtained from Masala-CHAI framework

For each sample circuit, the output consists of a number of files to help the user understand the output of various components in the pipeline:

scanned_circuit.png: Copy of the original circuit diagram.
detected_components.png, component_removed_circuit.png, components_description.txt: Output of the YOLOv8 component detection module:
- detected_components.png: Components marked with bounding boxes.
- component_removed_circuit.png: Components replaced with white spaces.
- components_description.txt: Text file containing the description of the detected components.
nodes_terminals.png, connections_descriptions.txt, nodes_description.txt: Detected nodes in the circuit using Hough Transform:
- nodes_terminals.png: Detected nodes in the circuit.
- connections_descriptions.txt: Text file containing descriptions of various connections.
- nodes_description.txt: Text file containing the description of various nodes in the circuit.
text_and_comp_removed_circuit.png: Original circuit diagram after removing all text content and detected circuit components.
rebuilt_circuit.png: Original circuit diagram overlaid with components and nodes.
original_withComponentsAndLineLabels.png, original_withLineLabels.png: Used for better visualization of the model output.
sample_statistics.json: Dictionary describing types of components in the circuit, along with node and net information.
spice.txt: Final generated SPICE netlist for the circuit diagram.

We also provide a helpful visualization of model output using a jupyer notebook: visualize.ipynb

Dataset Link

We utilize Masala-CHAI to create the largest open-sourced corpus for parallel circuit descriptions and SPICE netlists. You can download the dataset here: https://drive.google.com/file/d/1t0Wqo7RIQqqpE3AcbLaIGB9sX5XDsSYm/view?usp=drive_link

Inference with Finetuned Code Llama 70B

Please refer to ./codellama-endpoints/ for detailed instructions and checkpoints.

Annotated dataset from Analog-Genie

AnalogGenie (https://github.com/xz-group/AnalogGenie/tree/main) provides a new dataset with SPICE netlists but does not include circuit description captions for fine-tuning LLMs. We used our Masala-CHAI framework on their dataset to generate captions for the respective schematics. You can find this dataset here: ./analoggenie.jsonl

Citing Masala-CHAI

If you use Masala-CHAI or the shared dataset in your research, please cite using the following BibTeX entry:

@misc{bhandari2025masalachailargescalespicenetlist,
      title={Masala-CHAI: A Large-Scale SPICE Netlist Dataset for Analog Circuits by Harnessing AI}, 
      author={Jitendra Bhandari and Vineet Bhat and Yuheng He and Hamed Rahmani and Siddharth Garg and Ramesh Karri},
      year={2025},
      eprint={2411.14299},
      archivePrefix={arXiv},
      primaryClass={cs.AR},
      url={https://arxiv.org/abs/2411.14299}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Masala-CHAI: A Large-Scale SPICE Netlist Dataset for Analog Circuits by Harnessing AI

Abstract

File and Folder Description

Steps to Run the Framework

Schematic Caption Generation using GPT-4o

Descriptions of outputs obtained from Masala-CHAI framework

Dataset Link

Inference with Finetuned Code Llama 70B

Annotated dataset from Analog-Genie

Citing Masala-CHAI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Dataset		Dataset
__pycache__		__pycache__
code llama-endpoints		code llama-endpoints
hough		hough
models		models
sample-images		sample-images
sample-output		sample-output
trained_checkpoints		trained_checkpoints
utils		utils
README.md		README.md
analoggenie.jsonl		analoggenie.jsonl
environment.yml		environment.yml
main.py		main.py
molmo-objdetection.ipynb		molmo-objdetection.ipynb
run.py		run.py
visualize.ipynb		visualize.ipynb

jitendra-bhandari/Masala-CHAI

Folders and files

Latest commit

History

Repository files navigation

Masala-CHAI: A Large-Scale SPICE Netlist Dataset for Analog Circuits by Harnessing AI

Abstract

File and Folder Description

Steps to Run the Framework

Schematic Caption Generation using GPT-4o

Descriptions of outputs obtained from Masala-CHAI framework

Dataset Link

Inference with Finetuned Code Llama 70B

Annotated dataset from Analog-Genie

Citing Masala-CHAI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages