Skip to content

File Cache

James Fantin-Hardesty edited this page Sep 3, 2025 · 5 revisions

Cloudfuse File Cache

Overview

The file_cache component accelerates access to cloud data by caching files on local disk. It improves:

  • Write-heavy workloads by buffering file changes locally before uploading.
  • Read workloads where files fit in the local cache and are reused.

file_cache is supported on Linux and Windows.

Enable File Cache

To enable file_cache, first specify file_cache under the components sequence between libfuse and attr_cache. Note stream, block_cache, and file_cache currently can not co-exist.

components:
  - libfuse
  - file_cache
  - attr_cache
  - s3storage

or

components:
  - libfuse
  - file_cache
  - attr_cache
  - azstorage

How It Works

  • Local caching

    • Files are stored under a local cache directory (file_cache.path) using the same relative paths as in the bucket.
    • Reads are served from the local copy once present. Writes go to the local copy first for speed.
  • Download and upload

    • Files are downloaded on demand (first access that needs data).
    • Changes are uploaded to the cloud on flush/close. If ignore-sync is true, fsync/flush does not trigger an upload.
  • Eviction (space and time based)

    • Policy: Least Recently Used (LRU).
    • Inactivity timeout (timeout-sec): Files that haven’t been accessed for a full timeout window are eligible for eviction when the LRU “marker” rotates.
    • Disk pressure: Eviction also runs when cache usage crosses high-threshold and stops at low-threshold.
    • Batching: At most max-eviction files are removed per pass.
    • Safety: Open or recently validated files are skipped; eviction only removes files that are safe to delete.
  • Usage accounting

    • If max-size-mb is set, cache usage is the size of the cache directory relative to that limit.
    • If max-size-mb is 0, usage is taken from the underlying filesystem statistics for the cache path.
  • Persistence across restarts

    • The LRU state is snapshotted to a small hidden file in the cache directory and reloaded at startup so hot/cold priorities are preserved.
  • Platform notes

    • Works on Linux and Windows.

Configuration Options

All options go under file_cache unless otherwise noted. Defaults reflect the current implementation.

  • path: Path to local disk cache. Default $HOME/.cloudfuse/file_cache
  • timeout-sec: Cache eviction timeout in seconds. Default 216000 (60 hours)
    • If libfuse.direct-io is true, timeout-sec is forced to 0 (no TTL caching).
  • max-eviction: Number of files eligible for eviction at once. Default 5000
  • max-size-mb: Maximum cache size. Default 80% of free disk space on the cache filesystem
  • high-threshold: Percent full that triggers eviction. Default 80
  • low-threshold: Percent full at which eviction stops. Default 60
  • create-empty-file: true|false. Create an empty object in cloud on file create (useful for immutable containers). Default false
  • allow-non-empty-temp: true|false. Allow non-empty cache directory at startup; set to true to persist cache across reboots. Default false
  • cleanup-on-start: true|false. Cleanup cache directory on startup if not empty. Default false
  • policy-trace: true|false. Emit extra eviction logs for diagnostics. Default false
  • offload-io: true|false. If true, file_cache handles read/write calls (instead of libfuse) which can help specific scenarios. Default false
  • sync-to-flush: true|false. If true, fsync triggers upload to cloud. Default true
  • ignore-sync: true|false. If true, fsync is ignored and local file is not deleted/purged because of sync. Default false
  • refresh-sec: Seconds after which a cached file is compared to cloud and refreshed if the cloud has a newer copy (only when the file is not open). Default 0 (disabled)
  • hard-limit: true|false. If true and max-size-mb is set, read/write/open that would exceed the configured size returns ENOSPC. Default false

Sample Config

After adding the components, add the following section to your Cloudfuse config file. The following example enables Cloudfuse to use /home/myuser/tempcache as the local cache with a maximum size of 4096 MB of memory with a timeout of 120 seconds.

file_cache:
  path: /home/myuser/tempcache
  timeout-sec: 120
  max-size-mb: 4096

Monitoring and stats

When the health monitor is enabled, file_cache reports:

  • Cache Usage (MB)
  • Usage Percent
  • Files Downloaded
  • Files served from cache

See the Health Monitor documentation for setup and where to find the JSON output.

Clone this wiki locally