Conversation
…s and added README-busiest-collections.md
…he toolbox directory.
…t-collections.js for improved readability.
There was a problem hiding this comment.
Pull request overview
Adds a new toolbox script to analyze mongosync JSONL logs and summarize the busiest namespaces/collections by write operations, with optional JSON and Markdown exports.
Changes:
- Added
get-busiest-collections.jsto parse mongosync logs and aggregate write event counts by namespace. - Added a dedicated README for the new tool and linked it from the main toolbox README.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| migration/toolbox/getBusiestCollection/get-busiest-collections.js | New Node.js log parser/aggregator producing console output plus JSON/Markdown exports. |
| migration/toolbox/getBusiestCollection/README.md | Usage + example output documentation for the new script. |
| migration/toolbox/README.md | Adds the new tool to the toolbox index. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| input: fileStream, | ||
| crlfDelay: Infinity, | ||
| }); | ||
|
|
There was a problem hiding this comment.
fs.createReadStream(filePath) can emit an error event asynchronously (e.g., permission denied / file disappears). Right now there’s no listener, so the process can crash with an unhandled stream error. Attach an error handler and fail the current file (or reject processFiles) with a clear message including the path.
| fileStream.on('error', (err) => { | |
| const message = `Error reading file "${filePath}": ${err.message}`; | |
| if (!suppressConsole) { | |
| console.error('\x1b[31m%s\x1b[0m', message); | |
| } | |
| rl.close(); | |
| }); |
| if (lineCounter > MAX_LINES_PER_FILE) { | ||
| if (!suppressConsole) { | ||
| console.warn('\x1b[33m%s\x1b[0m', `Warning: Processing of "${filePath}" stopped after ${MAX_LINES_PER_FILE} lines for safety.`); | ||
| } |
There was a problem hiding this comment.
When lineCounter > MAX_LINES_PER_FILE you break out of the for await (const line of rl) loop but don’t close the readline interface or destroy the underlying stream. In Node, breaking early can leave the stream open and keep reading in the background. Ensure you rl.close() and fileStream.destroy() (or otherwise stop/cleanup) before breaking.
| } | |
| } | |
| rl.close(); | |
| fileStream.destroy(); |
| } | ||
| } catch (error) { | ||
| if (!suppressConsole) { | ||
| console.error('Error processing line:', line); |
There was a problem hiding this comment.
On JSON parse/processing errors this logs the full raw log line. Mongosync logs can contain sensitive data, and printing entire lines can leak PII into terminals/CI logs. Prefer logging file + line number (and maybe a truncated prefix) instead of the full line content.
| console.error('Error processing line:', line); | |
| const previewLength = 80; | |
| const linePreview = | |
| typeof line === 'string' | |
| ? line.slice(0, previewLength) + (line.length > previewLength ? '...' : '') | |
| : ''; | |
| console.error( | |
| `Error processing line ${lineCounter} in file "${filePath}".` + | |
| (linePreview ? ` Line preview: ${linePreview}` : '') | |
| ); |
| ```bash | ||
| node get-busiest-collections.js </path/to/mongosynclog/files-or-directory> [--markdown] [--no-console] | ||
| ``` | ||
|
|
There was a problem hiding this comment.
The usage text implies the script accepts wildcard patterns directly. The implementation checks fs.existsSync(inputPath) and doesn’t expand globs itself, so unexpanded patterns (common on Windows or when quoted) will fail. Either document that wildcards must be expanded by the shell, or implement glob expansion in the script.
| The script expects an existing file or directory path. It does not expand wildcard/glob patterns (such as `*.log`) | |
| itself, so any wildcards must be expanded by your shell before invoking the script (for example, by relying on | |
| unquoted shell globs). On platforms or shells where patterns are not expanded automatically, pass explicit paths. |
| Namespace | Total Write Ops | delete | insert | update | ||
| -------------------------------- | ----------------- | ---------- | ---------- | ---------- | ||
| db0.test2 | 29,847 | 5,503 | 9,419 | 14,925 | ||
| db0.test5 | 7,289 | 2,456 | 2,438 | 2,395 | ||
| db0.test1 | 7,253 | 2,476 | 2,450 | 2,327 | ||
| db0.test4 | 7,176 | 2,414 | 2,386 | 2,376 | ||
| db0.test3 | 7,076 | 2,352 | 2,360 | 2,364 |
There was a problem hiding this comment.
The example output table doesn’t include the replace column, but the script initializes event types with delete/insert/replace/update and will output replace when present (and it alphabetically sorts event types, so column ordering may differ). Update the example output to match the script’s actual columns/order.
| Namespace | Total Write Ops | delete | insert | update | |
| -------------------------------- | ----------------- | ---------- | ---------- | ---------- | |
| db0.test2 | 29,847 | 5,503 | 9,419 | 14,925 | |
| db0.test5 | 7,289 | 2,456 | 2,438 | 2,395 | |
| db0.test1 | 7,253 | 2,476 | 2,450 | 2,327 | |
| db0.test4 | 7,176 | 2,414 | 2,386 | 2,376 | |
| db0.test3 | 7,076 | 2,352 | 2,360 | 2,364 | |
| Namespace | Total Write Ops | delete | insert | replace | update | |
| -------------------------------- | ----------------- | ---------- | ---------- | ---------- | ---------- | |
| db0.test2 | 29,847 | 5,503 | 9,419 | 1,234 | 13,691 | |
| db0.test5 | 7,289 | 2,456 | 2,438 | 150 | 2,245 | |
| db0.test1 | 7,253 | 2,476 | 2,450 | 100 | 2,227 | |
| db0.test4 | 7,176 | 2,414 | 2,386 | 120 | 2,256 | |
| db0.test3 | 7,076 | 2,352 | 2,360 | 130 | 2,234 |
| ## Get Busiest Collection | ||
|
|
||
| **Script:** `get-busiest-collections.js` | ||
|
|
||
| Gets the busiest collections in terms of writes (delete/insert/replace/update) as recorded in the mongosync logs in the CEA phase |
There was a problem hiding this comment.
The doc title says “Get Busiest Collection” (singular) but the script/file name and output summarize multiple collections/namespaces. Consider renaming the heading (and/or directory name) to consistently use “collections” to avoid confusion.
Added a new script get-busiest-collections.js under migration/toolbox/getBusiestCollection which will help identify the busiest collection in terms of writes during the migration