Skip to content

Conversation

tangledbytes
Copy link
Member

@tangledbytes tangledbytes commented Oct 13, 2025

Describe the Problem

Explain the Changes

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • Adds automated tape reclaim for Glacier with a configurable interval and a CLI reclaim action.
    • Logs reclaimable entries during uploads and deletions to support reclaim processing.
  • Improvements

    • Adds DMAPI tape UID support to improve reclaim accuracy and cluster coordination.
    • Cluster locking and timestamping to coordinate reclaim runs.
  • Style

    • Minor documentation formatting tweaks.

Copy link

coderabbitai bot commented Oct 13, 2025

Walkthrough

Adds Glacier tape reclaim end-to-end: new config flags, CLI reclaim action, reclaim processing in manage_nsfs, Glacier reclaim flow and locks, TapeCloud reclaim execution, NamespaceFS reclaim WAL logging, and native DMAPI xattr (IBMUID) exposure for reclaim metadata.

Changes

Cohort / File(s) Summary
Configuration flags
config.js
Adds NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM (default false) and NSFS_GLACIER_RECLAIM_INTERVAL (default 15 * 60 * 1000). Minor JSDoc formatting changes.
Manage CLI: reclaim action wiring
src/cmd/manage_nsfs.js, src/manage_nsfs/manage_nsfs_constants.js, src/manage_nsfs/manage_nsfs_glacier.js
Adds GLACIER_ACTIONS.RECLAIM and CLI option, routes reclaim to process_reclaim() which mirrors expiry flow (interval and low-space gating), and updates reclaim timestamp file.
Glacier SDK: reclaim flow and locks
src/sdk/glacier.js
Adds constants (RECLAIM_TIMESTAMP_FILE, GPFS_DMAPI_XATTR_TAPE_UID, RECLAIM_WAL_NAME, GLACIER_RECLAIM_CLUSTER_LOCK), new reclaim(fs_context, log_file, failure_recorder) method, and perform(..., "RECLAIM") branch that acquires cluster lock and processes WAL.
TapeCloud integration
src/sdk/glacier_tapecloud.js
Adds TapeCloudUtils.RECLAIM_SCRIPT and reclaim(file); extends TapeCloudGlacier with reclaim(...) and _process_reclaimed(file) to execute reclaim script and log errors without throwing.
NamespaceFS: reclaim WAL logging
src/sdk/namespace_fs.js
Adds append_to_reclaim_wal(fs_context, file_path, stat), static getter reclaim_wal, and _reclaim_wal field. Logs reclaim entries during upload finalize and delete/unlink flows when flags enabled; non-fatal handling of WAL errors.
Native FS: DMAPI xattr (IBMUID)
src/native/fs/fs_napi.cpp
Adds macros for IBMUID/GPFS DMAPI tape UID and includes GPFS_DMAPI_XATTR_TAPE_UID in the exported DMAPI xattr list.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor App as App/Client
  participant NFS as NamespaceFS
  participant FS as Filesystem
  participant WAL as Reclaim WAL

  App->>NFS: finalize upload / delete
  NFS->>FS: stat & get xattrs (incl. IBMUID) [if DMAPI reclaim enabled]
  NFS->>WAL: append(full_path, logical_size, xattrs)
  WAL-->>NFS: ack
  NFS-->>App: operation complete
Loading
sequenceDiagram
  autonumber
  actor Op as Operator/Timer
  participant CLI as manage_nsfs
  participant MNG as manage_nsfs_glacier
  participant G as Glacier.perform("RECLAIM")
  participant Lock as ClusterLock
  participant TC as TapeCloudGlacier
  participant Utils as TapeCloudUtils

  Op->>CLI: run --glacier reclaim
  CLI->>MNG: manage_glacier_operations(RECLAIM)
  MNG->>G: perform(fs_context,"RECLAIM")
  G->>Lock: acquire GLACIER_RECLAIM_CLUSTER_LOCK
  alt interval and free-space checks pass
    G->>TC: reclaim(fs_context, log_file, failure_recorder)
    TC->>Utils: reclaim(WAL entry file)
    Utils-->>TC: returns (logs errors non-throwing)
    TC-->>G: processed entries
    G->>G: write reclaim timestamp
  else skipped
    G-->>MNG: noop
  end
  G->>Lock: release
  MNG-->>CLI: done
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • guymguym
  • dannyzaken
  • alphaprinz
  • jackyalbo

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary change of adding tape reclaim support and clearly references the affected components (NSFS, NC, and GLACIER) without unnecessary detail or ambiguity, making the purpose of the pull request immediately clear to reviewers.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
src/sdk/namespace_fs.js (1)

2128-2140: Pass the stat to avoid duplicate filesystem calls.

In both deletion paths, the code calls append_to_reclaim_wal without passing the stat parameter, which causes the method to fetch the stat again from the filesystem. However, the stat is already available in the lifecycle deletion path from the verification step.

Optimize by passing the stat when available:

At line 2128, in the lifecycle deletion path:

 try {
     files = await this._open_files(fs_context, { src_path: file_path, delete_version: true });
+    const stat = await files.delete_version.src_file.stat(fs_context);
     await this._verify_lifecycle_filter_and_unlink(fs_context, params, file_path, files.delete_version);
-    await this.append_to_reclaim_wal(fs_context, file_path);
+    await this.append_to_reclaim_wal(fs_context, file_path, stat);
 } catch (err) {

Note: The non-lifecycle path at lines 2135-2140 doesn't have easy access to the stat, so it's acceptable to let the method fetch it.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69d598a and fa98cef.

📒 Files selected for processing (7)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
  • GLACIER_ACTIONS (25-30)
src/manage_nsfs/manage_nsfs_glacier.js (1)
src/util/native_fs_utils.js (2)
  • fs_context (549-554)
  • config (10-10)
src/sdk/glacier.js (3)
src/cmd/manage_nsfs.js (3)
  • fs_context (73-73)
  • type (57-57)
  • native_fs_utils (23-23)
src/manage_nsfs/manage_nsfs_glacier.js (5)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
  • native_fs_utils (9-9)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
src/sdk/glacier_tapecloud.js (3)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
src/manage_nsfs/manage_nsfs_glacier.js (4)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
  • fs_context (191-191)
  • fs_context (303-303)
  • fs_context (439-439)
src/sdk/namespace_fs.js (2)
src/manage_nsfs/manage_nsfs_glacier.js (7)
  • config (6-6)
  • nb_native (7-7)
  • nb_native (85-85)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
src/native/fs/fs_napi.cpp (2)
  • stat (2029-2033)
  • stat (2030-2030)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (6)
src/manage_nsfs/manage_nsfs_constants.js (1)

29-29: LGTM!

The RECLAIM action is correctly added to GLACIER_ACTIONS and properly configured in VALID_OPTIONS_GLACIER following the same pattern as existing glacier actions.

Also applies to: 76-76

src/cmd/manage_nsfs.js (1)

882-884: LGTM!

The RECLAIM action handler is correctly integrated following the same pattern as other glacier operations (MIGRATE, RESTORE, EXPIRY).

config.js (1)

931-931: LGTM!

The new configuration flag is appropriately placed in the GLACIER configuration section with a conservative default of false, requiring explicit opt-in for tape reclaim functionality.

src/sdk/glacier.js (1)

24-24: LGTM!

The reclaim infrastructure is correctly integrated:

  • New constants follow naming conventions (RECLAIM_TIMESTAMP_FILE, RECLAIM_WAL_NAME, GLACIER_RECLAIM_CLUSTER_LOCK)
  • Base reclaim() method properly throws 'Unimplemented' for subclass implementation
  • perform() correctly handles "RECLAIM" type with single-phase processing (no staging)

Note: The reclaim flow intentionally differs from MIGRATE/RESTORE by using direct log processing without a staging phase, which is appropriate for cleanup operations.

Also applies to: 89-89, 101-101, 197-209, 229-229, 296-299

src/sdk/namespace_fs.js (2)

3771-3780: LGTM - reclaim_wal getter follows established patterns.

The static getter for reclaim_wal is correctly implemented and follows the same pattern as migrate_wal and restore_wal.


3854-3855: LGTM - static field declaration is correct.

The static field declaration for _reclaim_wal is consistent with other WAL declarations in the class.

Comment on lines 61 to 73
async function process_reclaim() {
const fs_context = native_fs_utils.get_process_fs_context();
const backend = Glacier.getBackend();

if (
await backend.low_free_space() ||
!(await time_exceeded(fs_context, config.NSFS_GLACIER_RESTORE_INTERVAL, Glacier.RECLAIM_TIMESTAMP_FILE))
) return;

await backend.perform(prepare_galcier_fs_context(fs_context), "RECLAIM");
const timestamp_file_path = path.join(config.NSFS_GLACIER_LOGS_DIR, Glacier.RECLAIM_TIMESTAMP_FILE);
await record_current_time(fs_context, timestamp_file_path);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use dedicated interval configuration for reclaim.

Line 67 uses config.NSFS_GLACIER_RESTORE_INTERVAL to control reclaim timing. This couples reclaim execution to the restore schedule, preventing independent control. Consider adding a dedicated NSFS_GLACIER_RECLAIM_INTERVAL configuration (similar to NSFS_GLACIER_MIGRATE_INTERVAL and NSFS_GLACIER_RESTORE_INTERVAL) to allow separate tuning of reclaim frequency based on operational requirements.

🤖 Prompt for AI Agents
In src/manage_nsfs/manage_nsfs_glacier.js around lines 61 to 73, the reclaim
logic incorrectly reuses config.NSFS_GLACIER_RESTORE_INTERVAL; add a dedicated
NSFS_GLACIER_RECLAIM_INTERVAL to the configuration (provide a sensible default,
e.g., equal to the current restore interval for backward compatibility), update
config defaults/env parsing and docs, then replace the time_exceeded call to use
config.NSFS_GLACIER_RECLAIM_INTERVAL so reclaim timing is controlled
independently.

Comment on lines +443 to +484
/**
*
* @param {nb.NativeFSContext} fs_context
* @param {LogFile} log_file log filename
* @param {(entry: string) => Promise<void>} failure_recorder
* @returns {Promise<boolean>}
*/
async reclaim(fs_context, log_file, failure_recorder) {
try {
return this._process_reclaimed(log_file.log_path);
} catch (error) {
dbg.error('unexpected error occured while running tapecloud.reclaim:', error);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Unused parameter: failure_recorder is never invoked.

The failure_recorder parameter is passed to reclaim() but never used, unlike migrate() and restore() which record failures for retry. The method delegates to _process_reclaimed() which calls TapeCloudUtils.reclaim(), but that method doesn't accept a failure recorder and always returns true even on error (lines 196-207).

If failure handling is intentionally omitted for reclaim (as suggested by the comment at line 190), consider either:

  1. Removing the unused failure_recorder parameter from the signature, or
  2. Implementing failure recording if retries are needed

Otherwise, failures during reclaim will be silently ignored with no retry mechanism.

Comment on lines 3725 to 3750
/**
*
* @param {nb.NativeFSContext} fs_context
* @param {string} file_path
* @param {nb.NativeFSStats} [stat]
* @returns
*/
async append_to_reclaim_wal(fs_context, file_path, stat) {
console.log('calling append_to_reclaim_wal:', file_path);
if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;

if (!stat) {
stat = await nb_native().fs.stat(fs_context, file_path);
}

const data = JSON.stringify({
full_path: file_path,
logical_size: stat.size,
ea: stat.xattr,
});
console.log('calling append_to_reclaim_wal data:', { file_path, data });
await NamespaceFS.reclaim_wal.append(data);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove debug console.log statements and add error handling.

The method contains debug console.log statements on lines 3733 and 3745 that should be removed before merging to production. Additionally, the method lacks error handling for the stat operation and WAL append, which could cause unhandled promise rejections.

Apply this diff:

 async append_to_reclaim_wal(fs_context, file_path, stat) {
-    console.log('calling append_to_reclaim_wal:', file_path);
     if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;

-    if (!stat) {
-        stat = await nb_native().fs.stat(fs_context, file_path);
+    try {
+        if (!stat) {
+            stat = await nb_native().fs.stat(fs_context, file_path);
+        }
+
+        const data = JSON.stringify({
+            full_path: file_path,
+            logical_size: stat.size,
+            ea: stat.xattr,
+        });
+        await NamespaceFS.reclaim_wal.append(data);
+    } catch (err) {
+        // Log error but don't fail the operation
+        dbg.error('append_to_reclaim_wal: failed to log reclaim entry', file_path, err);
     }
-
-    const data = JSON.stringify({
-        full_path: file_path,
-        logical_size: stat.size,
-        ea: stat.xattr,
-    });
-    console.log('calling append_to_reclaim_wal data:', { file_path, data });
-    await NamespaceFS.reclaim_wal.append(data);
 }

The try-catch ensures that reclaim logging failures don't break the main operation flow.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
*
* @param {nb.NativeFSContext} fs_context
* @param {string} file_path
* @param {nb.NativeFSStats} [stat]
* @returns
*/
async append_to_reclaim_wal(fs_context, file_path, stat) {
console.log('calling append_to_reclaim_wal:', file_path);
if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;
if (!stat) {
stat = await nb_native().fs.stat(fs_context, file_path);
}
const data = JSON.stringify({
full_path: file_path,
logical_size: stat.size,
ea: stat.xattr,
});
console.log('calling append_to_reclaim_wal data:', { file_path, data });
await NamespaceFS.reclaim_wal.append(data);
}
/**
*
* @param {nb.NativeFSContext} fs_context
* @param {string} file_path
* @param {nb.NativeFSStats} [stat]
* @returns
*/
async append_to_reclaim_wal(fs_context, file_path, stat) {
if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;
try {
if (!stat) {
stat = await nb_native().fs.stat(fs_context, file_path);
}
const data = JSON.stringify({
full_path: file_path,
logical_size: stat.size,
ea: stat.xattr,
});
await NamespaceFS.reclaim_wal.append(data);
} catch (err) {
// Log error but don't fail the operation
dbg.error('append_to_reclaim_wal: failed to log reclaim entry', file_path, err);
}
}
🤖 Prompt for AI Agents
In src/sdk/namespace_fs.js around lines 3725 to 3747, remove the two debug
console.log statements and wrap the body that performs stat and
NamespaceFS.reclaim_wal.append in a try-catch so failures don’t bubble up; if
stat fails, catch and log the error (using the module's logger or console.error)
and return quietly, and if append fails catch and log that error as well without
rethrowing so reclaim logging cannot break the main flow.

@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from fa98cef to 1217008 Compare October 13, 2025 05:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/sdk/namespace_fs.js (2)

1408-1409: Logic inconsistency: stat destination before file is moved there, but then log wrong path.

This code has a critical flaw flagged in the previous review that remains unaddressed:

  1. You stat file_path (the destination) before the uploaded file has been moved there from upload_path
  2. Errors are silently suppressed with .catch(_.noop), potentially hiding real issues
  3. At line 1460, you log upload_path to the reclaim WAL, not file_path, which is inconsistent with the stat you just performed

If the intent is to detect tape reclaim on the existing destination file (before overwrite), then:

  • file_path might not exist yet (ENOENT for new uploads)
  • You should log file_path (the file being replaced), not upload_path
  • ENOENT should be handled explicitly

Apply this diff to fix the logic:

-const file_path_stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
-    await nb_native().fs.stat(fs_context, file_path).catch(_.noop);
+let file_path_stat;
+if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && !same_inode && !part_upload) {
+    try {
+        file_path_stat = await nb_native().fs.stat(fs_context, file_path);
+    } catch (err) {
+        if (err.code !== 'ENOENT') {
+            dbg.error('_finish_upload: failed to stat destination for reclaim check', file_path, err);
+        }
+        // Destination doesn't exist yet (new file), no reclaim needed
+    }
+}

And update line 1460 to use file_path:

 if (file_path_stat && file_path_stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) {
-    await this.append_to_reclaim_wal(fs_context, upload_path, file_path_stat);
+    await this.append_to_reclaim_wal(fs_context, file_path, file_path_stat);
 }

3725-3745: Add error handling to prevent reclaim logging failures from breaking main operations.

The method lacks error handling for the stat operation and WAL append, which could cause unhandled promise rejections. While debug console.log statements mentioned in the previous review appear to have been removed, the error handling issue remains critical.

Apply this diff:

 async append_to_reclaim_wal(fs_context, file_path, stat) {
     if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;

-    if (!stat) {
-        stat = await nb_native().fs.stat(fs_context, file_path);
+    try {
+        if (!stat) {
+            stat = await nb_native().fs.stat(fs_context, file_path);
+        }
+
+        const data = JSON.stringify({
+            full_path: file_path,
+            logical_size: stat.size,
+            ea: stat.xattr,
+        });
+        await NamespaceFS.reclaim_wal.append(data);
+    } catch (err) {
+        // Log error but don't fail the operation
+        dbg.error('append_to_reclaim_wal: failed to log reclaim entry', file_path, err);
     }
-
-    const data = JSON.stringify({
-        full_path: file_path,
-        logical_size: stat.size,
-        ea: stat.xattr,
-    });
-    await NamespaceFS.reclaim_wal.append(data);
 }

This ensures that reclaim logging failures don't break the main operation flow (upload, delete, etc.).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa98cef and 1217008.

📒 Files selected for processing (9)
  • .editorconfig (0 hunks)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/native/fs/fs_napi.cpp (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (6 hunks)
💤 Files with no reviewable changes (1)
  • .editorconfig
🚧 Files skipped from review as they are similar to previous changes (2)
  • config.js
  • src/manage_nsfs/manage_nsfs_constants.js
🧰 Additional context used
🧬 Code graph analysis (2)
src/manage_nsfs/manage_nsfs_glacier.js (1)
src/util/native_fs_utils.js (2)
  • fs_context (549-554)
  • config (10-10)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
  • GLACIER_ACTIONS (25-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (3)
src/sdk/namespace_fs.js (3)

2128-2128: LGTM: Reclaim logging after lifecycle deletion.

The call to append_to_reclaim_wal here correctly logs files deleted via lifecycle policies. The error handling appropriately catches ENOENT in case the file was already deleted.


2135-2140: LGTM: Reclaim logging after regular deletion.

The call to append_to_reclaim_wal here correctly logs files after successful deletion in non-lifecycle flows. The try-catch properly handles expected errors (ENOENT, EISDIR) while allowing unexpected errors to propagate.


3769-3778: LGTM: Proper lazy initialization for reclaim WAL logger.

The static getter follows the established pattern used by migrate_wal and restore_wal, with appropriate configuration for shared locking and polling interval.

Comment on lines +196 to +207
static async reclaim(file) {
try {
dbg.log1("Starting reclaim for file", file);
const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
dbg.log4("reclaim finished with:", out);
dbg.log1("Finished reclaim for file", file);
} catch (error) {
dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
}

return true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate reclaim failures so WAL entries aren’t lost.

TapeCloudUtils.reclaim() always resolves to true, even when exec() fails. That tells PersistentLogger.process() the reclaim batch succeeded, so the WAL file is deleted and the entries are never retried. Please return false (or rethrow) on failure so the WAL stays on disk for the next run.

     static async reclaim(file) {
         try {
             dbg.log1("Starting reclaim for file", file);
             const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
             dbg.log4("reclaim finished with:", out);
             dbg.log1("Finished reclaim for file", file);
-        } catch (error) {
-            dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
+            return true;
+        } catch (error) {
+            dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
+            return false;
         }
-
-        return true;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static async reclaim(file) {
try {
dbg.log1("Starting reclaim for file", file);
const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
dbg.log4("reclaim finished with:", out);
dbg.log1("Finished reclaim for file", file);
} catch (error) {
dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
}
return true;
}
static async reclaim(file) {
try {
dbg.log1("Starting reclaim for file", file);
const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
dbg.log4("reclaim finished with:", out);
dbg.log1("Finished reclaim for file", file);
return true;
} catch (error) {
dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
return false;
}
}
🤖 Prompt for AI Agents
In src/sdk/glacier_tapecloud.js around lines 196-207 the async reclaim() always
returns true even when exec() throws, causing PersistentLogger to treat failures
as successes and delete WAL entries; change the catch to return false (or
rethrow) so failures propagate — keep the existing error logging, and ensure the
function only returns true on successful exec and returns false inside the catch
block so the WAL is retained for retry.

@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from 1217008 to 0cd41c1 Compare October 13, 2025 07:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1217008 and 0cd41c1.

📒 Files selected for processing (8)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/native/fs/fs_napi.cpp (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • config.js
  • src/cmd/manage_nsfs.js
  • src/manage_nsfs/manage_nsfs_glacier.js
  • src/sdk/glacier_tapecloud.js
  • src/sdk/namespace_fs.js
🧰 Additional context used
🧬 Code graph analysis (1)
src/sdk/glacier.js (4)
src/cmd/manage_nsfs.js (3)
  • fs_context (73-73)
  • type (57-57)
  • native_fs_utils (23-23)
src/manage_nsfs/manage_nsfs_glacier.js (5)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
  • native_fs_utils (9-9)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
  • fs_context (191-191)
  • fs_context (303-303)
  • fs_context (439-439)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (9)
src/manage_nsfs/manage_nsfs_constants.js (2)

25-30: LGTM!

The addition of the RECLAIM action follows the existing pattern and is consistent with other Glacier actions.


72-77: LGTM!

The reclaim option is correctly configured with the same pattern as other Glacier operations.

src/native/fs/fs_napi.cpp (2)

53-57: LGTM!

The IBMUID DMAPI extended attribute macros follow the established pattern for other DMAPI xattrs (IBMObj, IBMPMig, IBMTPS) and are correctly constructed.


256-261: LGTM!

The GPFS_DMAPI_XATTR_TAPE_UID is correctly added to the GPFS_DMAPI_XATTRS vector, ensuring it will be retrieved alongside other DMAPI attributes.

src/sdk/glacier.js (5)

24-24: LGTM!

The RECLAIM_TIMESTAMP_FILE constant follows the established naming pattern for other timestamp files.


89-89: LGTM!

The RECLAIM_WAL_NAME constant follows the naming pattern of other WAL constants.


101-101: LGTM!

The GLACIER_RECLAIM_CLUSTER_LOCK constant follows the established naming pattern for cluster locks.


229-229: LGTM!

The perform method signature is correctly updated to include "RECLAIM" in the type union.


296-299: LGTM!

The RECLAIM handling correctly uses a cluster lock and processes the reclaim WAL logs. The simpler pattern (without staging) is appropriate for cleanup operations.

Comment on lines +75 to +83
/**
* GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID
*
* Example: `1284427297506873931-5499940123615166566-1799306066-279655-0` (here 279655 is
* the inode number)
*
* NOTE: If IBMUID EA exists, that means the file is either migrated or premigrated.
*/
static GPFS_DMAPI_XATTR_TAPE_UID = 'dmapi.IBMUID';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix redundant wording in JSDoc.

Line 76 has redundant wording: "contains UID which contains the unique ID of the UID". Consider simplifying to "contains the unique ID of the file".

Apply this diff to improve the documentation:

-    /**
-     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID
-     * 
+    /**
+     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains the unique ID of the file
+     * 
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID
*
* Example: `1284427297506873931-5499940123615166566-1799306066-279655-0` (here 279655 is
* the inode number)
*
* NOTE: If IBMUID EA exists, that means the file is either migrated or premigrated.
*/
static GPFS_DMAPI_XATTR_TAPE_UID = 'dmapi.IBMUID';
/**
* GPFS_DMAPI_XATTR_TAPE_UID xattr contains the unique ID of the file
*
* Example: `1284427297506873931-5499940123615166566-1799306066-279655-0` (here 279655 is
* the inode number)
*
* NOTE: If IBMUID EA exists, that means the file is either migrated or premigrated.
*/
static GPFS_DMAPI_XATTR_TAPE_UID = 'dmapi.IBMUID';
🤖 Prompt for AI Agents
In src/sdk/glacier.js around lines 75 to 83, the JSDoc for
GPFS_DMAPI_XATTR_TAPE_UID contains redundant wording ("contains UID which
contains the unique ID of the UID"); update the comment to a concise description
such as "contains the unique ID of the file" (keep the example and NOTE lines
intact), replacing the redundant phrase so the doc reads clearly and succinctly.

Comment on lines +197 to +209
/**
* reclaim cleans up inindexed items in the underlying
* glacier storage
*
* NOTE: This needs to be implemented by each backend.
* @param {nb.NativeFSContext} fs_context
* @param {LogFile} log_file log filename
* @param {(entry: string) => Promise<void>} failure_recorder
* @returns {Promise<boolean>}
*/
async reclaim(fs_context, log_file, failure_recorder) {
throw new Error('Unimplementented');
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in JSDoc comment.

Line 198 contains a typo: "inindexed" should be "unindexed".

Apply this diff:

-    /**
-     * reclaim cleans up inindexed items in the underlying
-     * glacier storage
+    /**
+     * reclaim cleans up unindexed items in the underlying
+     * glacier storage

The method structure correctly follows the pattern of migrate and restore methods.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* reclaim cleans up inindexed items in the underlying
* glacier storage
*
* NOTE: This needs to be implemented by each backend.
* @param {nb.NativeFSContext} fs_context
* @param {LogFile} log_file log filename
* @param {(entry: string) => Promise<void>} failure_recorder
* @returns {Promise<boolean>}
*/
async reclaim(fs_context, log_file, failure_recorder) {
throw new Error('Unimplementented');
}
/**
* reclaim cleans up unindexed items in the underlying
* glacier storage
*
* NOTE: This needs to be implemented by each backend.
* @param {nb.NativeFSContext} fs_context
* @param {LogFile} log_file log filename
* @param {(entry: string) => Promise<void>} failure_recorder
* @returns {Promise<boolean>}
*/
async reclaim(fs_context, log_file, failure_recorder) {
throw new Error('Unimplementented');
}
🤖 Prompt for AI Agents
In src/sdk/glacier.js around lines 197 to 209, fix the JSDoc typo by changing
"inindexed" to "unindexed" in the reclaim method description; leave the method
signature and implementation (throwing Error('Unimplementented')) as-is to match
the migrate/restore pattern (optional: you may also correct the thrown Error
message spelling to 'Unimplemented' but the requested change is only the JSDoc
typo).

Signed-off-by: Utkarsh Srivastava <[email protected]>
@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from 0cd41c1 to d747791 Compare October 13, 2025 12:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/sdk/glacier.js (2)

75-84: Tighten IBMUID JSDoc phrasing.

Redundant wording; simplify for clarity.

Apply:

-    /**
-     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID
+    /**
+     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains the unique ID of the file

197-209: Fix typos in reclaim stub.

  • JSDoc: “inindexed” → “unindexed”.
  • Error text: “Unimplementented” → “Unimplemented”.

Apply:

-    /**
-     * reclaim cleans up inindexed items in the underlying
+    /**
+     * reclaim cleans up unindexed items in the underlying
@@
-    async reclaim(fs_context, log_file, failure_recorder) {
-        throw new Error('Unimplementented');
+    async reclaim(fs_context, log_file, failure_recorder) {
+        throw new Error('Unimplemented');
     }
src/sdk/namespace_fs.js (1)

3730-3750: Make append_to_reclaim_wal non-failing and filter entries.

This must never break uploads/deletes. Wrap in try/catch, and log only when DMAPI tape xattr is present. Optionally, trim xattrs (privacy).

Apply:

-    async append_to_reclaim_wal(fs_context, file_path, stat) {
-        if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;
-
-        if (!stat) {
-            stat = await nb_native().fs.stat(fs_context, file_path);
-        }
-
-        const data = JSON.stringify({
-            full_path: file_path,
-            logical_size: stat.size,
-            ea: stat.xattr,
-        });
-        await NamespaceFS.reclaim_wal.append(data);
-    }
+    async append_to_reclaim_wal(fs_context, file_path, stat) {
+        if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;
+        try {
+            if (!stat) {
+                stat = await nb_native().fs.stat(fs_context, file_path);
+            }
+            if (!stat?.xattr?.[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) return;
+            const data = JSON.stringify({
+                full_path: file_path,
+                logical_size: stat.size,
+                // keep only DMAPI-related EAs to reduce noise/sensitivity
+                ea: {
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR],
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG],
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_TPS]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_TPS],
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_UID]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_UID],
+                },
+            });
+            await NamespaceFS.reclaim_wal.append(data);
+        } catch (err) {
+            dbg.warn('append_to_reclaim_wal: failed to log reclaim entry', file_path, err);
+        }
+    }
🧹 Nitpick comments (1)
src/sdk/namespace_fs.js (1)

2136-2145: Deletion path: WAL logging should gate on DMAPI indicator and avoid noisy warns.

  • Avoid logging all deletes; check DMAPI indicator before appending.
  • Wrap stat + append in try/catch; don’t warn on ENOENT.

Apply:

-            try {
-                const stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
-                    await nb_native().fs.stat(fs_context, file_path).catch(dbg.warn.bind(this));
-                await nb_native().fs.unlink(fs_context, file_path);
-                if (stat) {
-                    await this.append_to_reclaim_wal(fs_context, file_path, stat);
-                }
-            } catch (err) {
+            try {
+                let stat;
+                if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) {
+                    try {
+                        stat = await nb_native().fs.stat(fs_context, file_path);
+                    } catch (e) {
+                        if (!['ENOENT', 'EISDIR'].includes(e.code)) dbg.warn('delete_object stat failed', file_path, e);
+                    }
+                }
+                await nb_native().fs.unlink(fs_context, file_path);
+                if (stat?.xattr?.[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) {
+                    await this.append_to_reclaim_wal(fs_context, file_path, stat).catch(dbg.warn.bind(this));
+                }
+            } catch (err) {
                 if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err;
             }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd41c1 and d747791.

📒 Files selected for processing (8)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/native/fs/fs_napi.cpp (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/manage_nsfs/manage_nsfs_glacier.js
  • src/sdk/glacier_tapecloud.js
  • src/native/fs/fs_napi.cpp
  • config.js
🧰 Additional context used
🧬 Code graph analysis (2)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
  • GLACIER_ACTIONS (25-30)
src/sdk/glacier.js (1)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (6)
src/cmd/manage_nsfs.js (1)

882-884: Add RECLAIM action handling — looks good.

Switch case correctly wires to manage_nsfs_glacier.process_reclaim().

Optionally verify manage_nsfs_glacier exports process_reclaim and CLI help includes 'reclaim'.

src/manage_nsfs/manage_nsfs_constants.js (1)

29-30: RECLAIM action and options registered — good coverage.

GLACIER_ACTIONS and VALID_OPTIONS_GLACIER updated consistently.

Also applies to: 76-77

src/sdk/glacier.js (1)

24-25: RECLAIM constants and perform() branch — LGTM.

New timestamp/WAL/lock constants and perform() path align with existing patterns.

Also applies to: 89-90, 101-102, 296-300

src/sdk/namespace_fs.js (3)

3774-3783: Expose reclaim_wal — LGTM.

Consistent with migrate/restore WALs; SHARED writer lock matches usage.


3811-3813: Lifecycle deletion WAL append — OK, will be safe after making append non-throwing.

No further action if append_to_reclaim_wal is wrapped.

Ensure reclaim processing tolerates mixed entries with trimmed xattrs as proposed.


3860-3862: Static _reclaim_wal init — LGTM.

Comment on lines +1410 to 1412
const file_path_stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
await nb_native().fs.stat(fs_context, file_path).catch(_.noop);
this._verify_encryption(params.encryption, this._get_encryption_info(stat));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reclaim WAL: stat timing and path mismatch in _finish_upload.

  • Stat destination (file_path) only when needed; handle ENOENT explicitly.
  • Append WAL for the destination path (matches checked stat), not upload_path.

Apply:

-        const file_path_stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
-            await nb_native().fs.stat(fs_context, file_path).catch(_.noop);
+        let file_path_stat;
@@
-        if (!same_inode && !part_upload) {
-            // If the target file is already on tape then this is a candidate for tape reclaim
-            if (file_path_stat && file_path_stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) {
-                await this.append_to_reclaim_wal(fs_context, upload_path, file_path_stat);
-            }
+        if (!same_inode && !part_upload) {
+            if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) {
+                try {
+                    file_path_stat = await nb_native().fs.stat(fs_context, file_path);
+                } catch (err) {
+                    if (err.code !== 'ENOENT') throw err;
+                }
+            }
+            // If the target file is already on tape then this is a candidate for tape reclaim
+            if (file_path_stat && file_path_stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) {
+                await this.append_to_reclaim_wal(fs_context, file_path, file_path_stat);
+            }

Also applies to: 1459-1464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant