Skip to content

Add generate-node-sdk-pages script for making polished Node SDK pages#3348

Draft
myarmolinsky wants to merge 3 commits intomainfrom
split-sdk-docs-into-pages
Draft

Add generate-node-sdk-pages script for making polished Node SDK pages#3348
myarmolinsky wants to merge 3 commits intomainfrom
split-sdk-docs-into-pages

Conversation

@myarmolinsky
Copy link
Copy Markdown
Member

@myarmolinsky myarmolinsky commented Mar 24, 2026

Fibery: https://balena.fibery.io/Work/Project/Identify-the-next-steps-for-improving-growing-our-docs-2317
Improvement: https://balena.fibery.io/Work/Improvement/SDKs-(and-probably-other-generated-external-doc-pages)-Internal-linking-is-not-working-3947
Change-type: patch


Please make sure to read the CONTRIBUTING document before opening the PR for relevant information on contributing to the documentation. Thanks!

@myarmolinsky myarmolinsky force-pushed the split-sdk-docs-into-pages branch 3 times, most recently from 0234eef to 81cd5ab Compare March 30, 2026 15:26
@myarmolinsky myarmolinsky force-pushed the split-sdk-docs-into-pages branch 15 times, most recently from 3ec4cd2 to 451d8af Compare April 7, 2026 19:05
@myarmolinsky myarmolinsky changed the title generate-node-sdk-pages script Add generate-node-sdk-pages script for making polished Node SDK pages Apr 7, 2026
@myarmolinsky myarmolinsky force-pushed the split-sdk-docs-into-pages branch from 451d8af to 12e34d2 Compare April 7, 2026 19:44
@myarmolinsky myarmolinsky force-pushed the split-sdk-docs-into-pages branch from 12e34d2 to c6ec2dc Compare April 7, 2026 20:33
const inputPath = path.resolve(nodeSDKDocsDir);
const outputDir = path.join(__dirname, '../pages/reference/sdk/node-sdk');
const outputPath = path.resolve(outputDir);
const semver = require('semver'); // If you have semver installed, otherwise use a regex split
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftover AI comment

Comment on lines +222 to +229
},
);

// Some auth model methods have a md links hardcoded as `#balena.models.auth.X`
// This should correct them to `#X`
sectionContent = sectionContent.replace(
/\[([^\]]+)\]\(#balena\.(?:.*\.)?([^.)]+)\)/g,
'[$1](#$2)',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content splitting relies on hardcoded string markers (\n## Modules, \n## balena-sdk\n, <a name="balena"></a>, <a name="balena.errors"></a>). If the upstream SDK docs change the order of these sections, rename them, or alter the anchor formatting, the script will silently produce garbage output or crash.

Consider adding guards after each split to make failures explicit, e.g.:

if (firstSplitParts.length < 2) {
    console.error(`Expected "## Modules" marker not found in ${file}`);
    process.exit(1);
}

Same for secondSplitParts, thirdSplitParts, and fourthSplitParts.

const inputPath = path.resolve(nodeSDKDocsDir);
const outputDir = path.join(__dirname, '../pages/reference/sdk/node-sdk');
const outputPath = path.resolve(outputDir);
const semver = require('semver'); // If you have semver installed, otherwise use a regex split
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond removing that, require('semver') should be moved to the top of the file with the other requires rather than being buried after variable declarations.

Comment on lines +45 to +47

for (const item of items) {
if (item.name === 'README.md') {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At level === 0, items.sort(sortVersions) runs over all directory entries. If a non-version entry (stray .DS_Store, etc.) appears at this level, semver.rcompare will throw and fall through to the localeCompare fallback. Harmless today, but worth either filtering to directories-only before sorting, or adding a comment noting the assumption.

Comment on lines +97 to +102
let lines = content.split('\n');

return lines
.map((line) => {
const headingMatch = line.match(/^(#{2,6})\s(.*)/);
if (headingMatch) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flattenHeadings converts anything deeper than ### into bold text. The comment says this is to stay within GitBook's hierarchy, but it means any #### (or deeper) headings in the SDK docs -- commonly used for method parameters and return values -- lose their semantic heading status and become unlinkable bold paragraphs.

Is that an acceptable tradeoff? If GitBook truly only supports 3 levels, it's fine, but worth confirming and documenting here.

Comment thread package.json
"scripts": {
"test": "npm run sync-external -- --dry-run && npm run renovate:validate",
"sync-external": "node tools/sync-external.js",
"sync-external": "node tools/sync-external.js && npm run generate-node-sdk-pages",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chaining generate-node-sdk-pages into sync-external unconditionally means every sync-external invocation regenerates all SDK pages even if nothing in the SDK source docs changed.

Worth considering either making it a separate CI step that only triggers if the target is the Node SDK.

Comment on lines +190 to +198
// The regex captures the 3 specific pieces of data we need to build the new string
const anchorHeadingsPattern =
/(?:\*\s\*\s\*\s*)?<a name="([^"]+)"><\/a>\s*(#+)\s*([^\n\r]+)/g;

// The replacement string references those captured groups using $1, $2, and $3
sectionContent = sectionContent.replace(
anchorHeadingsPattern,
'$2 $1\n**$3**',
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content splitting here relies on hardcoded string markers (\n## Modules, \n## balena-sdk\n, <a name="balena"></a>, <a name="balena.errors"></a>). If the upstream SDK docs ever change the order, rename a section, or alter the anchor format, this will silently produce bad output or crash.

Would be good to add guards after each split, something like:

if (firstSplitParts.length < 2) {
    console.error(`Expected "## Modules" marker not found in ${file}`);
    process.exit(1);
}

At least then failures are obvious instead of generating broken pages.

idx > startIndex &&
line.trim().startsWith('* [') &&
!line.startsWith(' '), // Not deeply indented
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endIndex logic for finding where the Node SDK section ends is fragile. It looks for the next line starting with * [ that isn't indented by 4+ spaces, but if someone changes the indentation style in SUMMARY.md or adds a non-standard line, this could eat content from adjacent sections (like Python SDK).

Consider matching more specifically, e.g. looking for the next sibling-level entry at the exact same indent depth as the Node SDK line rather than relying on "not deeply indented."

} else {
// The regex captures the 3 specific pieces of data we need to build the new string
const anchorHeadingsPattern =
/(?:\*\s\*\s\*\s*)?<a name="([^"]+)"><\/a>\s*(#+)\s*([^\n\r]+)/g;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This anchorHeadingsPattern regex and the subsequent multi-capture-group sectionContent.replace calls are doing non-trivial transformations. Some inline comments showing the before/after for each transform would help future maintainers understand what these are doing without having to mentally execute the regex. For example:

// Before: <a name="balena.auth.authenticate"></a>
// ##### balena.auth.authenticate(credentials) ⇒ <code>Promise</code>
//
// After: ##### balena.auth.authenticate
// **balena.auth.authenticate(credentials) ⇒ <code>Promise</code>**

function splitDocs(inputDir, outputDir) {
// Create base output directory if it doesn't exist
if (fs.existsSync(outputDir)) {
fs.rmSync(outputDir, { recursive: true, force: true });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script rm -rfs the output directory and regenerates from scratch every time. The third commit commits the generated output, but there's nothing preventing someone from editing these files by hand and having their changes silently blown away on the next run.

Two suggestions:

  1. Add a header comment to each generated file, e.g. <!-- This file is auto-generated by tools/generate-node-sdk-pages.js. Do not edit manually. -->
  2. Consider adding a note in a README or .gitattributes marking these as generated.

return lines;
}

function flattenHeadings(content) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flattenHeadings converts anything deeper than ### into bold text to fit GitBook's hierarchy constraints. Just want to confirm that's acceptable for the SDK docs, since any #### headings (e.g. for method parameters or return values) will lose their semantic heading status and won't be linkable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants