File identification library for Rust.
Given a file (or some information about a file), return a set of standardized tags identifying what the file is.
This is a Rust port of the Python identify library.
- 🚀 Fast: Built in Rust with Perfect Hash Functions (PHF) for compile-time optimization
- 📁 Comprehensive: Identifies 315+ file types and formats
- 🔍 Smart detection: Uses file extensions, content analysis, and shebang parsing
- 📦 Library + CLI: Use as a Rust library or command-line tool
- ⚡ Zero overhead: PHF provides O(1) lookups with no runtime hash computation
- 🎯 Memory efficient: Static data structures with no lazy initialization
- ✅ Well-tested: Extensive test suite ensuring reliability
Add this to your Cargo.toml
:
[dependencies]
file-identify = "0.2.0"
use file_identify::{tags_from_path, tags_from_filename, tags_from_interpreter};
// Identify a file from its path
let tags = tags_from_path("/path/to/file.py").unwrap();
println!("{:?}", tags); // {"file", "text", "python", "non-executable"}
// Identify from filename only
let tags = tags_from_filename("script.sh");
println!("{:?}", tags); // {"text", "shell", "bash"}
// Identify from interpreter
let tags = tags_from_interpreter("python3");
println!("{:?}", tags); // {"python", "python3"}
# Install the CLI tool
cargo install file-identify
# Identify a file
file-identify setup.py
["file", "non-executable", "python", "text"]
# Use filename only (don't read file contents)
file-identify --filename-only setup.py
["python", "text"]
# Get help
file-identify --help
A call to tags_from_path
does this:
- What is the type: file, symlink, directory? If it's not file, stop here.
- Is it executable? Add the appropriate tag.
- Do we recognize the file extension? If so, add the appropriate tags, stop here. These tags would include binary/text.
- Peek at the first 1KB of the file. Use these to determine whether it is binary or text, add the appropriate tag.
- If identified as text above, try to read and interpret the shebang, and add appropriate tags.
By design, this means we don't need to partially read files where we recognize the file extension.
# Clone the repository
git clone [email protected]:grok-rs/file-identify.git
cd file-identify
# Build the project
cargo build
# Run tests
cargo test
# Run the CLI
cargo run -- path/to/file
This project uses pre-commit hooks to ensure code quality:
pip install pre-commit
pre-commit install
# Run all tests
cargo test
# Run with coverage (requires cargo-tarpaulin)
cargo install cargo-tarpaulin
cargo tarpaulin --out html
MIT