Skip to content

Adding a new script#175

Open
riteshsaigal wants to merge 14 commits intomongodb:masterfrom
riteshsaigal:scripts
Open

Adding a new script#175
riteshsaigal wants to merge 14 commits intomongodb:masterfrom
riteshsaigal:scripts

Conversation

@riteshsaigal
Copy link
Collaborator

Added a new script get-busiest-collections.js under migration/toolbox/getBusiestCollection which will help identify the busiest collection in terms of writes during the migration

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new toolbox script to analyze mongosync JSONL logs and summarize the busiest namespaces/collections by write operations, with optional JSON and Markdown exports.

Changes:

  • Added get-busiest-collections.js to parse mongosync logs and aggregate write event counts by namespace.
  • Added a dedicated README for the new tool and linked it from the main toolbox README.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
migration/toolbox/getBusiestCollection/get-busiest-collections.js New Node.js log parser/aggregator producing console output plus JSON/Markdown exports.
migration/toolbox/getBusiestCollection/README.md Usage + example output documentation for the new script.
migration/toolbox/README.md Adds the new tool to the toolbox index.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

input: fileStream,
crlfDelay: Infinity,
});

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs.createReadStream(filePath) can emit an error event asynchronously (e.g., permission denied / file disappears). Right now there’s no listener, so the process can crash with an unhandled stream error. Attach an error handler and fail the current file (or reject processFiles) with a clear message including the path.

Suggested change
fileStream.on('error', (err) => {
const message = `Error reading file "${filePath}": ${err.message}`;
if (!suppressConsole) {
console.error('\x1b[31m%s\x1b[0m', message);
}
rl.close();
});

Copilot uses AI. Check for mistakes.
if (lineCounter > MAX_LINES_PER_FILE) {
if (!suppressConsole) {
console.warn('\x1b[33m%s\x1b[0m', `Warning: Processing of "${filePath}" stopped after ${MAX_LINES_PER_FILE} lines for safety.`);
}
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When lineCounter > MAX_LINES_PER_FILE you break out of the for await (const line of rl) loop but don’t close the readline interface or destroy the underlying stream. In Node, breaking early can leave the stream open and keep reading in the background. Ensure you rl.close() and fileStream.destroy() (or otherwise stop/cleanup) before breaking.

Suggested change
}
}
rl.close();
fileStream.destroy();

Copilot uses AI. Check for mistakes.
}
} catch (error) {
if (!suppressConsole) {
console.error('Error processing line:', line);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On JSON parse/processing errors this logs the full raw log line. Mongosync logs can contain sensitive data, and printing entire lines can leak PII into terminals/CI logs. Prefer logging file + line number (and maybe a truncated prefix) instead of the full line content.

Suggested change
console.error('Error processing line:', line);
const previewLength = 80;
const linePreview =
typeof line === 'string'
? line.slice(0, previewLength) + (line.length > previewLength ? '...' : '')
: '';
console.error(
`Error processing line ${lineCounter} in file "${filePath}".` +
(linePreview ? ` Line preview: ${linePreview}` : '')
);

Copilot uses AI. Check for mistakes.
```bash
node get-busiest-collections.js </path/to/mongosynclog/files-or-directory> [--markdown] [--no-console]
```

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage text implies the script accepts wildcard patterns directly. The implementation checks fs.existsSync(inputPath) and doesn’t expand globs itself, so unexpanded patterns (common on Windows or when quoted) will fail. Either document that wildcards must be expanded by the shell, or implement glob expansion in the script.

Suggested change
The script expects an existing file or directory path. It does not expand wildcard/glob patterns (such as `*.log`)
itself, so any wildcards must be expanded by your shell before invoking the script (for example, by relying on
unquoted shell globs). On platforms or shells where patterns are not expanded automatically, pass explicit paths.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +22
Namespace | Total Write Ops | delete | insert | update
-------------------------------- | ----------------- | ---------- | ---------- | ----------
db0.test2 | 29,847 | 5,503 | 9,419 | 14,925
db0.test5 | 7,289 | 2,456 | 2,438 | 2,395
db0.test1 | 7,253 | 2,476 | 2,450 | 2,327
db0.test4 | 7,176 | 2,414 | 2,386 | 2,376
db0.test3 | 7,076 | 2,352 | 2,360 | 2,364
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example output table doesn’t include the replace column, but the script initializes event types with delete/insert/replace/update and will output replace when present (and it alphabetically sorts event types, so column ordering may differ). Update the example output to match the script’s actual columns/order.

Suggested change
Namespace | Total Write Ops | delete | insert | update
-------------------------------- | ----------------- | ---------- | ---------- | ----------
db0.test2 | 29,847 | 5,503 | 9,419 | 14,925
db0.test5 | 7,289 | 2,456 | 2,438 | 2,395
db0.test1 | 7,253 | 2,476 | 2,450 | 2,327
db0.test4 | 7,176 | 2,414 | 2,386 | 2,376
db0.test3 | 7,076 | 2,352 | 2,360 | 2,364
Namespace | Total Write Ops | delete | insert | replace | update
-------------------------------- | ----------------- | ---------- | ---------- | ---------- | ----------
db0.test2 | 29,847 | 5,503 | 9,419 | 1,234 | 13,691
db0.test5 | 7,289 | 2,456 | 2,438 | 150 | 2,245
db0.test1 | 7,253 | 2,476 | 2,450 | 100 | 2,227
db0.test4 | 7,176 | 2,414 | 2,386 | 120 | 2,256
db0.test3 | 7,076 | 2,352 | 2,360 | 130 | 2,234

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
## Get Busiest Collection

**Script:** `get-busiest-collections.js`

Gets the busiest collections in terms of writes (delete/insert/replace/update) as recorded in the mongosync logs in the CEA phase
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc title says “Get Busiest Collection” (singular) but the script/file name and output summarize multiple collections/namespaces. Consider renaming the heading (and/or directory name) to consistently use “collections” to avoid confusion.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants