fix(cloud-manager-client): replace adm-zip with zip-lib to preserve symlinks#1451
fix(cloud-manager-client): replace adm-zip with zip-lib to preserve symlinks#1451
Conversation
…ymlinks adm-zip's addLocalFolder uses fs.statSync which follows symlinks and crashes with ENOENT when targets can't be resolved on Lambda. Even when symlinks resolve, adm-zip dereferences them, destroying the symlink structure required by AEM dispatcher configs (enabled_farms/*.farm -> ../available_farms/*.farm). zip-lib properly handles symlinks in both directions: archiveFolder with followSymlinks: false stores them as standard ZIP symlink entries, and extract restores them as actual filesystem symlinks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
solaris007
left a comment
There was a problem hiding this comment.
Hey @rpapani,
Strengths
- Well-motivated fix for a real customer-impacting defect - AEM Dispatcher repos depend on symlink structure (
enabled_farms/*.farm->../available_farms/*.farm), and adm-zip'sfs.statSync()follows symlinks, causing ENOENT on Lambda and destroying symlink structure. Smoke-tested against Lexmark with 21+ symlinks. - Public API unchanged -
zipRepositorystill returnsPromise<Buffer>,unzipRepositorystill acceptsBuffer. Transparent to all downstream consumers. - zip-lib (built on yauzl/yazl) provides stronger zip-slip protection than adm-zip - two-layer defense via
validateFileName()+isOutsideTargetFolder()realpath checks. This is a security improvement. - Explicit
{ followSymlinks: false }(src/index.js:327) documents intent and protects against future default changes in the library. - Clean try/finally cleanup patterns in both methods, with tests verifying temp directory cleanup on both success and failure paths.
- Tests explicitly verify the
followSymlinks: falseoption is passed (test:~line 798), and the error cleanup test now verifies both temp directories are removed - an improvement over the old test which didn't check cleanup at all.
Issues
Important (Should Fix)
1. Temp directory leak if second mkdtempSync fails in unzipRepository
src/index.js:490-491
Two mkdtempSync calls happen before the try block:
const extractPath = mkdtempSync(...); // succeeds, creates dir on disk
const zipDir = mkdtempSync(...); // if this throws, extractPath leaksIf /tmp is nearly full (plausible on Lambda with 512 MB default), the second call fails after the first succeeds. The error propagates without entering the try/catch, so neither cleanup path runs. This is a new resource leak path not present in the old code.
Suggested fix - move zipDir inside the try block:
async unzipRepository(zipBuffer) {
const extractPath = mkdtempSync(path.join(os.tmpdir(), CLONE_DIR_PREFIX));
let zipDir;
try {
zipDir = mkdtempSync(path.join(os.tmpdir(), ZIP_DIR_PREFIX));
const zipFile = path.join(zipDir, 'repo.zip');
writeFileSync(zipFile, zipBuffer);
await extract(zipFile, extractPath);
this.log.info(`Repository extracted to ${extractPath}`);
return extractPath;
} catch (error) {
rmSync(extractPath, { recursive: true, force: true });
throw new Error(`Failed to unzip repository: ${error.message}`);
} finally {
if (zipDir) rmSync(zipDir, { recursive: true, force: true });
}
}2. Symlink targets not validated after extraction
src/index.js:496
zip-lib's extract() restores symlinks but does not validate symlink targets at creation time. A symlink entry like evil -> /etc/shadow would be created unconditionally. While zip-lib validates that no subsequent file write traverses through a previously extracted symlink, the symlink itself can point outside the extraction directory.
Exploitability is low-medium - the zip comes from S3 where it was written by zipRepository() from a Cloud Manager git clone, so the attacker would need to be a repo contributor who plants a malicious symlink among the legitimate dispatcher symlinks. On Lambda, blast radius is limited to /tmp contents and environment variables.
Suggestion: Add a post-extraction validation that checks all symlinks resolve within the extraction directory. This could also be tracked as a fast-follow if you want to ship the functional fix first.
Minor (Nice to Have)
3. Missing /tmp disk usage logging for zip operations
The class already calls #logTmpDiskUsage() after clone/push/pull, but neither zipRepository nor unzipRepository calls it. Since the new implementation writes intermediate files to /tmp (unlike the old in-memory approach), adding logging after archiveFolder and extract would provide operational visibility into the changed resource profile.
4. extract stub arguments not verified in unzip test
The test checks extractStub was called once but doesn't verify the source zip path or destination path it was called with. Compare with the zip test which properly checks all three archiveFolderStub arguments. A path construction bug would go undetected.
5. Unrelated package-lock.json changes
The diff adds "peer": true to 15+ unrelated packages and bumps versions of spacecat-shared-data-access and spacecat-shared-utils. These appear to be side effects of npm install on a different base. Not a correctness issue, but inflates the diff.
6. Inconsistent error wrapping between zip/unzip
unzipRepository wraps errors in a user-friendly message (Failed to unzip repository: ...), but zipRepository lets raw errors propagate. Minor inconsistency.
Recommendations
- Add a brief inline comment in both methods explaining why the temp file intermediary is needed (zip-lib is path-based, not buffer-based). Two years from now, someone will wonder why the code writes to disk just to read back into a buffer.
- Consider filing an issue on zip-lib requesting symlink target validation in its
extract()path.
Assessment
Ready to merge? With fixes
The dependency swap is well-justified, the implementation is clean, and the new library is a security improvement over adm-zip. The temp directory leak (issue #1) is a small structural fix worth addressing before merge. The symlink target validation (issue #2) is recommended defense-in-depth that could be addressed here or as a fast-follow. Everything else is polish.
Next Steps
| */ | ||
| async unzipRepository(zipBuffer) { | ||
| const extractPath = mkdtempSync(path.join(os.tmpdir(), CLONE_DIR_PREFIX)); | ||
| const zipDir = mkdtempSync(path.join(os.tmpdir(), ZIP_DIR_PREFIX)); |
There was a problem hiding this comment.
Temp dir leak: If this mkdtempSync throws (e.g., /tmp full), extractPath created on the line above leaks - the error propagates without entering the try/catch. Consider moving this inside the try block with let zipDir declared above.
| const zip = new AdmZip(zipBuffer); | ||
| zip.extractAllTo(extractPath, true); | ||
| writeFileSync(zipFile, zipBuffer); | ||
| await extract(zipFile, extractPath); |
There was a problem hiding this comment.
Symlink targets not validated: extract() restores symlinks but does not validate their targets. A malicious symlink like evil -> /etc/shadow would be created unconditionally. Low-medium risk given the trusted S3 source, but worth a post-extraction validation or a fast-follow.
…temp dir leak, and hardening - Add #validateSymlinks() that recursively checks all symlinks resolve within the repository root, called in both zipRepository (before archiving) and unzipRepository (after extraction) for defense-in-depth - Fix temp directory leak in unzipRepository when second mkdtempSync fails by moving zipDir creation inside the try block - Add #logTmpDiskUsage() calls to zip/unzip for operational visibility - Wrap zipRepository errors consistently with unzipRepository pattern - Verify extract stub arguments in tests - Add inline comments explaining why temp file intermediary is needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This PR will trigger a patch release when merged. |
|
@solaris007 I've incorporated all the feedback, can you please review again. Thank you!! |
solaris007
left a comment
There was a problem hiding this comment.
Hey @rpapani,
Strengths
Previously flagged issues now addressed (all 6/6):
- Temp dir leak - Fixed.
zipDirmoved inside the try block withlet zipDirpattern andif (zipDir)guard in finally. New ENOSPC test verifiesextractPathcleanup when the secondmkdtempSyncfails. - Symlink target validation - Fixed. New
#validateSymlinksmethod recursively checks all symlinks resolve within the repository root. Called pre-zip and post-extract for defense-in-depth. The containment check is sound - usesreadlinkSync(raw target, no follow) +path.resolve(lexical normalization) +path.relative(containment). Correctly handles absolute targets, relative..traversal, dangling symlinks, and chained symlinks. - Disk usage logging - Fixed.
#logTmpDiskUsage('zip')and#logTmpDiskUsage('unzip')added. - Extract stub args not verified - Fixed. Tests now assert both source and destination paths.
- Inconsistent error wrapping - Fixed.
zipRepositorynow wraps errors asFailed to zip repository: ..., matchingunzipRepository. - Inline comments - Fixed. Both methods document why the temp file round-trip is needed.
Test coverage is thorough - four new tests cover ENOSPC partial failure, absolute symlink escape, relative dispatcher-style symlinks (the real-world case), and pre-zip escape detection.
Assessment
Ready to merge? Yes
All prior findings are resolved cleanly. The #validateSymlinks implementation is correct, well-placed, and well-tested. The containment check conservatively uses lexical resolution (safe direction - false positives, not false negatives). Temp file lifecycle is tight. CI is green. Ship it.
Summary
adm-zipwithzip-libfor zip/unzip operations in the Cloud Manager client to preserve symlinks during code importProblem
AEM Dispatcher repositories use symlinks in
enabled_farms/andenabled_vhosts/directories to reference configs inavailable_farms/andavailable_vhosts/.adm-zip'saddLocalFolderinternally usesfs.statSync()which follows symlinks -- when a symlink target can't be resolved (e.g. on Lambda's filesystem), it throwsENOENTand the entire code import fails. Even when symlinks do resolve,adm-zipdereferences them and stores the target file content, destroying the symlink structure. On extraction, symlinks are never restored --adm-ziphas no symlink support in either direction.This breaks dispatcher functionality since the symlink structure (
enabled_farms/*.farm->../available_farms/*.farm) is required for AEM dispatcher to work properly.Solution
Replace
adm-zipwithzip-lib, which is the only Node.js ZIP library that properly handles symlinks in both directions:archiveFolder(path, zipFile, { followSymlinks: false })stores symlinks as standard ZIP symlink entries (Unix external attributes) instead of dereferencing themextract()restores symlinks as actual filesystem symlinks, and includes zip-slip path traversal protection viayauzl.validateFileName()+ realpath checksSince
zip-libis file-path-based (not buffer-based likeadm-zip), both methods use a temp file as an intermediary -- the zip is written to disk, read into a Buffer, then the temp file is cleaned up. The public API (zipRepositoryreturnsPromise<Buffer>,unzipRepositorytakesBuffer) is unchanged.Test plan