Add generate-node-sdk-pages script for making polished Node SDK pages#3348
Add generate-node-sdk-pages script for making polished Node SDK pages#3348myarmolinsky wants to merge 3 commits intomainfrom
generate-node-sdk-pages script for making polished Node SDK pages#3348Conversation
0234eef to
81cd5ab
Compare
3ec4cd2 to
451d8af
Compare
generate-node-sdk-pages script for making polished Node SDK pages
451d8af to
12e34d2
Compare
Change-type: patch
Change-type: minor
Change-type: patch
12e34d2 to
c6ec2dc
Compare
| const inputPath = path.resolve(nodeSDKDocsDir); | ||
| const outputDir = path.join(__dirname, '../pages/reference/sdk/node-sdk'); | ||
| const outputPath = path.resolve(outputDir); | ||
| const semver = require('semver'); // If you have semver installed, otherwise use a regex split |
| }, | ||
| ); | ||
|
|
||
| // Some auth model methods have a md links hardcoded as `#balena.models.auth.X` | ||
| // This should correct them to `#X` | ||
| sectionContent = sectionContent.replace( | ||
| /\[([^\]]+)\]\(#balena\.(?:.*\.)?([^.)]+)\)/g, | ||
| '[$1](#$2)', |
There was a problem hiding this comment.
The content splitting relies on hardcoded string markers (\n## Modules, \n## balena-sdk\n, <a name="balena"></a>, <a name="balena.errors"></a>). If the upstream SDK docs change the order of these sections, rename them, or alter the anchor formatting, the script will silently produce garbage output or crash.
Consider adding guards after each split to make failures explicit, e.g.:
if (firstSplitParts.length < 2) {
console.error(`Expected "## Modules" marker not found in ${file}`);
process.exit(1);
}Same for secondSplitParts, thirdSplitParts, and fourthSplitParts.
| const inputPath = path.resolve(nodeSDKDocsDir); | ||
| const outputDir = path.join(__dirname, '../pages/reference/sdk/node-sdk'); | ||
| const outputPath = path.resolve(outputDir); | ||
| const semver = require('semver'); // If you have semver installed, otherwise use a regex split |
There was a problem hiding this comment.
Beyond removing that, require('semver') should be moved to the top of the file with the other requires rather than being buried after variable declarations.
|
|
||
| for (const item of items) { | ||
| if (item.name === 'README.md') { |
There was a problem hiding this comment.
At level === 0, items.sort(sortVersions) runs over all directory entries. If a non-version entry (stray .DS_Store, etc.) appears at this level, semver.rcompare will throw and fall through to the localeCompare fallback. Harmless today, but worth either filtering to directories-only before sorting, or adding a comment noting the assumption.
| let lines = content.split('\n'); | ||
|
|
||
| return lines | ||
| .map((line) => { | ||
| const headingMatch = line.match(/^(#{2,6})\s(.*)/); | ||
| if (headingMatch) { |
There was a problem hiding this comment.
flattenHeadings converts anything deeper than ### into bold text. The comment says this is to stay within GitBook's hierarchy, but it means any #### (or deeper) headings in the SDK docs -- commonly used for method parameters and return values -- lose their semantic heading status and become unlinkable bold paragraphs.
Is that an acceptable tradeoff? If GitBook truly only supports 3 levels, it's fine, but worth confirming and documenting here.
| "scripts": { | ||
| "test": "npm run sync-external -- --dry-run && npm run renovate:validate", | ||
| "sync-external": "node tools/sync-external.js", | ||
| "sync-external": "node tools/sync-external.js && npm run generate-node-sdk-pages", |
There was a problem hiding this comment.
Chaining generate-node-sdk-pages into sync-external unconditionally means every sync-external invocation regenerates all SDK pages even if nothing in the SDK source docs changed.
Worth considering either making it a separate CI step that only triggers if the target is the Node SDK.
| // The regex captures the 3 specific pieces of data we need to build the new string | ||
| const anchorHeadingsPattern = | ||
| /(?:\*\s\*\s\*\s*)?<a name="([^"]+)"><\/a>\s*(#+)\s*([^\n\r]+)/g; | ||
|
|
||
| // The replacement string references those captured groups using $1, $2, and $3 | ||
| sectionContent = sectionContent.replace( | ||
| anchorHeadingsPattern, | ||
| '$2 $1\n**$3**', | ||
| ); |
There was a problem hiding this comment.
The content splitting here relies on hardcoded string markers (\n## Modules, \n## balena-sdk\n, <a name="balena"></a>, <a name="balena.errors"></a>). If the upstream SDK docs ever change the order, rename a section, or alter the anchor format, this will silently produce bad output or crash.
Would be good to add guards after each split, something like:
if (firstSplitParts.length < 2) {
console.error(`Expected "## Modules" marker not found in ${file}`);
process.exit(1);
}At least then failures are obvious instead of generating broken pages.
| idx > startIndex && | ||
| line.trim().startsWith('* [') && | ||
| !line.startsWith(' '), // Not deeply indented | ||
| ); |
There was a problem hiding this comment.
The endIndex logic for finding where the Node SDK section ends is fragile. It looks for the next line starting with * [ that isn't indented by 4+ spaces, but if someone changes the indentation style in SUMMARY.md or adds a non-standard line, this could eat content from adjacent sections (like Python SDK).
Consider matching more specifically, e.g. looking for the next sibling-level entry at the exact same indent depth as the Node SDK line rather than relying on "not deeply indented."
| } else { | ||
| // The regex captures the 3 specific pieces of data we need to build the new string | ||
| const anchorHeadingsPattern = | ||
| /(?:\*\s\*\s\*\s*)?<a name="([^"]+)"><\/a>\s*(#+)\s*([^\n\r]+)/g; |
There was a problem hiding this comment.
This anchorHeadingsPattern regex and the subsequent multi-capture-group sectionContent.replace calls are doing non-trivial transformations. Some inline comments showing the before/after for each transform would help future maintainers understand what these are doing without having to mentally execute the regex. For example:
// Before: <a name="balena.auth.authenticate"></a>
// ##### balena.auth.authenticate(credentials) ⇒ <code>Promise</code>
//
// After: ##### balena.auth.authenticate
// **balena.auth.authenticate(credentials) ⇒ <code>Promise</code>**| function splitDocs(inputDir, outputDir) { | ||
| // Create base output directory if it doesn't exist | ||
| if (fs.existsSync(outputDir)) { | ||
| fs.rmSync(outputDir, { recursive: true, force: true }); |
There was a problem hiding this comment.
The script rm -rfs the output directory and regenerates from scratch every time. The third commit commits the generated output, but there's nothing preventing someone from editing these files by hand and having their changes silently blown away on the next run.
Two suggestions:
- Add a header comment to each generated file, e.g.
<!-- This file is auto-generated by tools/generate-node-sdk-pages.js. Do not edit manually. --> - Consider adding a note in a README or
.gitattributesmarking these as generated.
| return lines; | ||
| } | ||
|
|
||
| function flattenHeadings(content) { |
There was a problem hiding this comment.
flattenHeadings converts anything deeper than ### into bold text to fit GitBook's hierarchy constraints. Just want to confirm that's acceptable for the SDK docs, since any #### headings (e.g. for method parameters or return values) will lose their semantic heading status and won't be linkable.
Fibery: https://balena.fibery.io/Work/Project/Identify-the-next-steps-for-improving-growing-our-docs-2317
Improvement: https://balena.fibery.io/Work/Improvement/SDKs-(and-probably-other-generated-external-doc-pages)-Internal-linking-is-not-working-3947
Change-type: patch