Skip to content

update ppl doc examples to use otel data#5303

Draft
ritvibhatt wants to merge 4 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples
Draft

update ppl doc examples to use otel data#5303
ritvibhatt wants to merge 4 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples

Conversation

@ritvibhatt
Copy link
Copy Markdown
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

PR Reviewer Guide 🔍

(Review updated until commit c051fd0)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Update docs exporter to support otel playground buttons

Relevant files:

  • scripts/docs_exporter/export_to_docs_website.py

Sub-PR theme: Replace otellogs test data with new otel dataset

Relevant files:

  • doctest/test_data/otellogs.json

Sub-PR theme: Update PPL command doc examples to use otel data

Relevant files:

  • docs/user/ppl/cmd/search.md
  • docs/user/ppl/cmd/stats.md
  • docs/user/ppl/cmd/fields.md

⚡ Recommended focus areas for review

Regex Logic Order

The new PPL block regex (lines 259-269) converts ppl fences with otellogs to sql + playground button, then line 272 runs another regex to convert remaining ppl fences to sql. However, the first regex already replaces the opening fence with ```sql for eligible blocks, so the second regex should not re-match them. But for non-eligible blocks, the first regex leaves them as ```ppl, and the second regex then converts them. This ordering appears correct, but the first regex emits ```ppl\n for non-eligible blocks (line 265), which the second regex (line 272) will then convert to ```sql. Verify this double-pass behavior is intentional and does not cause issues with the copy-button regex on line 280 matching those converted blocks and adding duplicate copy buttons.

content = re.sub(
    r'^```ppl\b[^\n]*\n(.*?)^```$',
    lambda m: (
        '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
        if _is_playground_eligible(m.group(1), current_file_path)
        else (
            '```ppl\n' + m.group(1) + '```'
        )
    ),
    content, flags=re.MULTILINE | re.DOTALL,
)

# Convert remaining PPL code fences to SQL
content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
Playground Check

The _is_playground_eligible function checks for source=otellogs as a substring of the block text. This could produce false positives if a comment or string literal inside the PPL block contains source=otellogs without it being the actual source command. Consider using a more precise pattern match (e.g., a regex anchored to the start of a line).

def _is_playground_eligible(block_text, file_path):
    """Check if a PPL block should get a Try in Playground button."""
    if 'source=otellogs' not in block_text:
        return False
    if file_path:
        stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '')
        if stem in PLAYGROUND_UNSUPPORTED_CMDS:
            return False
    return True
Misaligned Table

In the wildcard example result table at line 414, the third ERROR value appears to have inconsistent spacing/padding (| ERROR | vs | ERROR |), which may cause rendering issues in the documentation website.

| ERROR       |
Operator Precedence

The description at line 297 states the expression is evaluated as severityText="ERROR" OR (resource.attributes.service.name="frontend" AND severityText="INFO"), but the result table (lines 306-314) shows only INFO rows from frontend with no ERROR rows. This is inconsistent — if the OR includes all ERROR rows, there should be ERROR results too. Verify the query and expected output are correct.

The following query demonstrates operator precedence. Because `AND` binds tighter than `OR`, this is evaluated as `severityText="ERROR" OR (resource.attributes.service.name="frontend" AND severityText="INFO")`:

```ppl
search severityText="ERROR" OR `resource.attributes.service.name`="frontend" AND severityText="INFO" source=otellogs
| fields severityText, `resource.attributes.service.name`
| head 5
fetched rows / total rows = 4/4
+--------------+----------------------------------+
| severityText | resource.attributes.service.name |
|--------------+----------------------------------|
| INFO         | frontend                         |
| INFO         | frontend                         |
| INFO         | frontend                         |
| INFO         | frontend                         |
+--------------+----------------------------------+

</details>

</td></tr>
</table>

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

PR Code Suggestions ✨

Latest suggestions up to c051fd0

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove orphaned code lines outside code fence

The PPL code block for Example 3 is broken — there is a stray closing
triple-backtick followed by two orphaned pipeline stages (| where age > 30 and |
fields state, age) that appear outside the code fence. These leftover lines from the
old example were not removed and will render as raw text in the documentation.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The `| where age > 30` and `| fields state, age` lines are leftover from the old example and appear outside the code fence, which will render as raw text in the documentation and break the example's presentation.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Prevent duplicate copy buttons on playground-eligible blocks</summary>

___


**The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.*$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, which is the </code><br><code>intended behavior. However, the non-playground branch is missing the copy button and </code><br><code>the try-in-playground button, so the final output for non-otellogs PPL blocks will </code><br><code>only get a copy button from the later generic SQL regex — this is fine, but the </code><br><code>playground-eligible branch adds the copy button inline AND the later regex will add </code><br><code>another copy button because the </code>(?!\n\{% include copy)<code> lookahead checks for a </code><br><code>newline before </code>{% include copy<code>, but the playground block ends with </code>{% include <br>try-in-playground.html %}<code>, not </code>{% include copy<code>. This means playground-eligible </code><br><code>blocks will get a duplicate </code>{% include copy.html %}<code> appended.</code>**

[scripts/docs_exporter/export_to_docs_website.py [259-282]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R282)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
             '```ppl\n' + m.group(1) + '```'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
 
+# Convert remaining PPL code fences to SQL
+content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
+
+# Add copy buttons after sql/bash/sh fences that don't already have one
+content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)(?!\n\{% include try-in-playground)',
+                 r'```\1\n\2```\n{% include copy.html %}',
+                 content, flags=re.MULTILINE | re.DOTALL)
+
Suggestion importance[1-10]: 7

__

Why: This is a valid bug: playground-eligible blocks already include {% include copy.html %}, but the later regex at line 280 checks only for (?!\n\{% include copy) — since those blocks end with {% include try-in-playground.html %}, the lookahead won't match and a second copy button will be appended. The fix to also exclude blocks followed by {% include try-in-playground is correct and important.

Medium
Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 now use identical PPL queries (both lack
keepempty=true) and return identical results, making them indistinguishable. The
first sub-example should use keepempty=true to demonstrate retaining null values, as
the preceding description states "by default, records with null values are dropped."

docs/user/ppl/cmd/dedup.md [108-128]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

-The query returns the following results:

-```text
-fetched rows / total rows = 3/3
-+-----------------------------------------------------------------------------+

- instrumentationScope.name
- @opentelemetry/instrumentation-http
- Microsoft.Extensions.Hosting
- go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
-+-----------------------------------------------------------------------------+
-```
<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Both sub-examples in Example 3 use identical queries without `keepempty=true`, making them indistinguishable despite the description implying different behaviors. The first sub-example should include `keepempty=true` to match its stated purpose.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in example output</summary>

___


**The result table for Example 3 shows <code>WARN</code> appearing twice with the same count (<code>4</code>), <br>which looks like a data error. With <code>override=true</code>, the subsearch values should <br>replace the main search values for matching rows, so <code>INFO</code> and <code>DEBUG</code> rows (which have <br>no match in the subsearch) should retain their original counts while <code>ERROR</code> and <code>WARN</code> <br>rows get replaced. The current output is misleading and inconsistent with the <br>described behavior.**

[docs/user/ppl/cmd/appendcol.md [96-103]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R96-R103)

```diff
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
Suggestion importance[1-10]: 6

__

Why: The result table shows WARN appearing twice with count 4, which is inconsistent with the described behavior of override=true. The query filters for ERROR and WARN only in the subsearch, so INFO and DEBUG rows from the main search should still appear with their original values, making the output misleading.

Low
General
Fill in missing example body content

Example 3 has a heading but no body — the PPL query, description, and expected
output are completely missing. This leaves a blank section in the documentation
between the heading and Example 4.

docs/user/ppl/cmd/convert.md [78-80]

 ## Example 3: Extract and convert memory sizes from error messages
+
+The following query extracts and converts memory size strings from error messages to kilobytes:
+
+```ppl
+source=otellogs
+| rex field=body "(?<memory>\d+[kmgKMG]?b?)"
+| convert memk(memory)
+| where NOT ISNULL(memory)
+| stats avg(memory) as avg_memory_kb by `resource.attributes.service.name`
+```
 
 ## Example 4: Multiple field conversions
Suggestion importance[1-10]: 5

__

Why: Example 3 has a heading but no content, leaving a blank section in the documentation. However, the suggested improved_code is speculative and may not reflect actual data available in the otellogs index used throughout the PR.

Low
Use precise pattern matching for source index check

The check 'source=otellogs' not in block_text is a plain substring match and could
produce false positives for strings like source=otellogssomething or #
source=otellogs in a comment. Consider using a more precise pattern such as a
word-boundary regex to ensure the source is exactly otellogs.

scripts/docs_exporter/export_to_docs_website.py [248-256]

 def _is_playground_eligible(block_text, file_path):
     """Check if a PPL block should get a Try in Playground button."""
-    if 'source=otellogs' not in block_text:
+    if not re.search(r'\bsource\s*=\s*otellogs\b', block_text):
         return False
     if file_path:
         stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '')
         if stem in PLAYGROUND_UNSUPPORTED_CMDS:
             return False
     return True
Suggestion importance[1-10]: 4

__

Why: The suggestion to use a regex word-boundary check instead of a plain substring match for source=otellogs is a reasonable robustness improvement, but the risk of false positives in practice (e.g., source=otellogssomething) is low given the PPL syntax. The improvement is minor but valid.

Low
Remove duplicated note about coordinating node

The new introductory paragraph duplicates the information already present in the
existing > Note block directly below it. Having both the paragraph and the note
say the same thing about the coordinating node creates redundancy. Consider removing
the note block since the new paragraph is more descriptive, or merging them.

docs/user/ppl/cmd/eval.md [6-8]

 The `eval` command processes data after documents are retrieved from the shards. This means that `eval` cannot be used to filter documents before they are returned. Use a `where` clause for filtering. Additionally, because `eval` computations are performed on the coordinating node rather than distributed across data nodes, performance may be slower for large result sets.
 
-> **Note**: The `eval` command is not rewritten to [query domain-specific language (DSL)](https://docs.opensearch.org/latest/query-dsl/). It is only executed on the coordinating node.
-
Suggestion importance[1-10]: 4

__

Why: The new paragraph and the existing > **Note** block both describe the same coordinating node behavior, creating redundancy. Removing the note would improve clarity, but this is a minor documentation style issue.

Low
Restore accidentally removed limitations section

The entire "Limitations" section was removed from appendpipe.md in this PR. This
section contained important information about schema compatibility constraints that
users need to be aware of when using the command.

docs/user/ppl/cmd/appendpipe.md [82]

+## Limitations
 
+The `appendpipe` command has the following limitations:
 
+* **Schema compatibility**: When fields with the same name exist in both the main search and the subpipeline but have incompatible types, the query fails with an error. To avoid type conflicts, ensure that fields with the same name share the same data type. Alternatively, use different field names. You can rename the conflicting fields using `eval` or select non-conflicting columns using `fields`.
+
Suggestion importance[1-10]: 4

__

Why: The Limitations section was removed from appendpipe.md in this PR, but looking at the diff, the old hunk shows it was present and the new hunk does not include it. However, the existing_code and improved_code are identical, so this is more of a verification request than a concrete fix.

Low
Fix misaligned table column padding in output

The unknown values in the instrumentationScope.name column are not padded to match
the column width, causing misaligned table formatting in the rendered output. The
column width should be consistent across all rows to maintain proper table alignment
in the markdown text block.

docs/user/ppl/cmd/fillnull.md [53-63]

-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
 | ERROR        | payment                          | @opentelemetry/instrumentation-http                                         |
-| ERROR        | payment                          | unknown                                                            |
+| ERROR        | payment                          | unknown                                                                     |
 | WARN         | product-catalog                  | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
-| WARN         | product-catalog                  | unknown                                                            |
-| ERROR        | product-catalog                  | unknown                                                            |
-| ERROR        | recommendation                   | unknown                                                            |
+| WARN         | product-catalog                  | unknown                                                                     |
+| ERROR        | product-catalog                  | unknown                                                                     |
+| ERROR        | recommendation                   | unknown                                                                     |
Suggestion importance[1-10]: 3

__

Why: The unknown values in the instrumentationScope.name column are not padded to match the column width, causing visual misalignment in the text block. This is a minor formatting issue that affects readability but not functionality.

Low

Previous suggestions

Suggestions up to commit f0e3a6a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix corrupted markdown document structure

The stats.md file has been corrupted — raw query result output has been injected
directly into the ## Parameters section header and the parameter table header,
breaking the document structure. The query result blocks appear to have been
incorrectly merged into the markdown prose. These sections need to be restored to
their original clean markdown headings and table structure.

docs/user/ppl/cmd/stats.md [18-36]

-## fetched rows / total rows = 1/1
-+-----+
-| p90 |
-|-----|
-| 21  |
-+-----+arameters
+## Parameters
 
 The `stats` command supports the following parameters.
 
-| fetched rows / total rows = 5/5
-...
-+--------------------------------+--------------+arameter | Required/Optional | Description |
+| Parameter | Required/Optional | Description |
Suggestion importance[1-10]: 9

__

Why: The stats.md file has raw query result output injected directly into the ## Parameters section header and the parameter table header, completely breaking the document structure. This is a critical documentation corruption issue that needs to be fixed.

High
Fix corrupted aggregation function list entries

The aggregation function list in stats.md has raw query result tables injected
mid-word throughout the bullet points (e.g., VAR_SAM

instead of VAR_SAMP). This
corrupts all the function name entries and their descriptions. The list needs to be
restored to clean bullet points with proper function names.

docs/user/ppl/cmd/stats.md [52-187]

-* `VAR_SAMfetched rows / total rows = 5/5
-...
-` -- Sample variance
-* `VAR_fetched rows / total rows = 1/1
-...
-Ofetched rows / total rows = 3/3
-...
-` -- fetched rows / total rows = 1/1
-...
-opulation variance
-* `STDDEV_SAMfetched rows / total rows = 4/4
-...
-` -- Sample standard deviation
-* `STDDEV_fetched rows / total rows = 5/5
-...
-Ofetched rows / total rows = 1/1
-...
-` -- fetched rows / total rows = 5/5
-...
-opulation standard deviation
+* `VAR_SAMP` -- Sample variance
+* `VAR_POP` -- Population variance
+* `STDDEV_SAMP` -- Sample standard deviation
+* `STDDEV_POP` -- Population standard deviation
+* `DISTINCT_COUNT_APPROX` -- Approximate distinct count
+* `TAKE` -- List of original values
+* `PERCENTILE`/`PERCENTILE_APPROX` -- Percentile calculations
+* `PERC<percent>`/`P<percent>` -- Percentile shortcut functions
Suggestion importance[1-10]: 9

__

Why: The aggregation function list has raw query result tables injected mid-word throughout the bullet points (e.g., VAR_SAM<table> instead of VAR_SAMP), corrupting all function name entries and descriptions. This is a critical documentation corruption that makes the content unreadable.

High
Fix corrupted Parameters section header

The ## Parameters section header and the parameter table header have been
accidentally replaced with query result blocks. This appears to be a copy-paste
error that corrupts the documentation structure, making the parameters section
unreadable.

docs/user/ppl/cmd/head.md [16-30]

-## fetched rows / total rows = 1/1
-+------------+
-| total_logs |
-|------------|
-| 20         |
-+------------+arameters
+## Parameters
 
 The `head` command supports the following parameters.
 
-| fetched rows / total rows = 1/1
-+--------------+
-| avg_severity |
-|--------------|
-| 12.0         |
-+--------------+arameter | Required/Optional | Description |
+| Parameter | Required/Optional | Description |
Suggestion importance[1-10]: 9

__

Why: The ## Parameters section header and the parameter table header have been accidentally replaced with query result blocks, making the documentation structurally broken and unreadable. This is a clear copy-paste error that needs to be fixed.

High
Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 use identical PPL queries (both use dedup
instrumentationScope.name without keepempty=true) and return identical results. The
first sub-example is supposed to demonstrate keepempty=true behavior (keeping null
values), while the second demonstrates the default behavior (dropping nulls). The
first query is missing keepempty=true and the result should show a null row.

docs/user/ppl/cmd/dedup.md [107-127]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates while keeping documents with `null` values in the specified field:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

-fetched rows / total rows = 3/3
+fetched rows / total rows = 4/4
+-----------------------------------------------------------------------------+
| instrumentationScope.name                                                   |
|-----------------------------------------------------------------------------|
+| null                                                                        |
| @opentelemetry/instrumentation-http                                         |
| Microsoft.Extensions.Hosting                                                |
| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
+-----------------------------------------------------------------------------+
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: Both sub-examples in Example 3 use identical PPL queries, but they are meant to demonstrate different behaviors (`keepempty=true` vs default). The first query is missing `keepempty=true` and the result should include a null row, making this a genuine documentation bug.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix filter value to match actual dataset service name</summary>

___


**The example filters for <code>resource.attributes.service.name = 'payment-service'</code>, but <br>the dataset uses <code>payment</code> (not <code>payment-service</code>) as the service name, resulting in 0 <br>rows returned. An example that intentionally returns no results is confusing for <br>documentation purposes and does not demonstrate the <code>AND</code> operator effectively. The <br>service name in the filter should match an actual value present in the dataset, such <br>as <code>payment</code>.**

[docs/user/ppl/cmd/where.md [50-69]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R50-R69)

```diff
 ## Example 2: Filter using combined criteria
 
 The following query narrows down errors to a specific service during an incident investigation, combining severity and service name conditions with `AND`:
 
 ```ppl
 source=otellogs
-| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment-service'
+| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment'
 | sort severityNumber
 | fields severityText, severityNumber, `resource.attributes.service.name`

The query returns the following results:

-fetched rows / total rows = 0/0
+fetched rows / total rows = 2/2
+--------------+----------------+----------------------------------+
| severityText | severityNumber | resource.attributes.service.name |
|--------------+----------------+----------------------------------|
+| ERROR        | 17             | payment                          |
+| ERROR        | 17             | payment                          |
+--------------+----------------+----------------------------------+
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The filter uses `'payment-service'` but the dataset contains `'payment'`, causing 0 rows returned. An example returning empty results for a combined `AND` filter is misleading and doesn't demonstrate the feature effectively.


</details></details></td><td align=center>Low

</td></tr><tr><td rowspan=4>General</td>
<td>



<details><summary>Remove duplicate example with identical query and results</summary>

___


**Example 2 is a duplicate of Example 1 — both use the exact same PPL query and return <br>the exact same results. This provides no additional value to the reader. Replace <br>Example 2 with a distinct scenario, such as demonstrating the <code>replace</code> command with a <br>different field or showing how multiple replacements interact when both patterns <br>match.**

[docs/user/ppl/cmd/replace.md [52-78]](https://github.com/opensearch-project/sql/pull/5303/files#diff-8644862a762e96b24031ad83e31729768a40b3387fda49b9a3a4f0c6f4c3de39R52-R78)

```diff
-## Example 2: Replace service names for display
+## Example 2: Replace severity text labels for display
 
-The following query shortens service names for a compact dashboard view:
+The following query replaces verbose severity labels with shorter abbreviations for compact reporting:
 
 ```ppl
 source=otellogs
 | where severityText IN ('ERROR', 'WARN')
-| replace 'payment-service' with 'payment' in `resource.attributes.service.name`
-| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name`
+| replace 'ERROR' with 'ERR' in severityText
+| replace 'WARN' with 'WRN' in severityText
 | sort severityNumber, `resource.attributes.service.name`
 | fields severityText, `resource.attributes.service.name`
 | head 4

-The query returns the following results:

-text -fetched rows / total rows = 4/4 -+--------------+----------------------------------+ -| severityText | resource.attributes.service.name | -|--------------+----------------------------------| -| WARN | frontend-proxy | -| WARN | frontend-proxy | -| WARN | product-catalog | -| WARN | product-catalog | -+--------------+----------------------------------+ -

<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Example 2 in `replace.md` is an exact duplicate of Example 1 — same PPL query, same description, same results. This provides no value to readers and should be replaced with a distinct scenario.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fill in missing example content</summary>

___


**Example 3 has a heading but no content — the query, description, and expected output <br>are missing entirely. This leaves a broken section in the documentation that jumps <br>directly from an empty heading to Example 4.**

[docs/user/ppl/cmd/convert.md [78-80]](https://github.com/opensearch-project/sql/pull/5303/files#diff-76c639678d1e487015ecd16ab9353af28891a46b2d29760c877e41904ed29eedR78-R80)

```diff
 ## Example 3: Extract and convert memory sizes from error messages
+
+The following query extracts and converts memory size strings from error log messages to kilobytes:
+
+```ppl
+source=otellogs
+| where severityText = 'ERROR'
+| rex field=body "(?<mem_size>\d+[kmg]b?)"
+| convert memk(mem_size)
+| where NOT ISNULL(mem_size)
+| fields body, mem_size
+| head 3
+```
 
 ## Example 4: Multiple field conversions
Suggestion importance[1-10]: 6

__

Why: Example 3 has a heading but no content, jumping directly to Example 4. While the suggestion provides plausible content, the improved code is speculative and may not match the actual intended example for this PR.

Low
Ensure non-playground PPL blocks get copy buttons

The non-playground branch converts ppl blocks back to ppl fences without adding a
copy button or converting to sql. However, the subsequent re.sub call converts all
remaining ppl fences to sql without adding a copy button. Non-playground ppl blocks
should be converted to sql with a copy button in the first pass, or the else branch
should emit sql with a copy button directly, to avoid inconsistent output.

scripts/docs_exporter/export_to_docs_website.py [259-269]

 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
-            '```ppl\n' + m.group(1) + '```'
+            '```sql\n' + m.group(1) + '```\n{% include copy.html %}'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
Suggestion importance[1-10]: 5

__

Why: The non-playground else branch keeps ppl fences without a copy button, relying on the subsequent re.sub to convert them to sql, but that subsequent call also doesn't add copy buttons. Converting to sql with a copy button directly in the else branch would be more consistent and correct.

Low
Fix duplicate row in override example result table

The result table for Example 3 shows WARN appearing twice with the same agg value of
4, which is inconsistent with the described behavior of override=true replacing main
search values with subsearch values. The subsearch only covers ERROR and WARN, so
INFO and DEBUG rows from the main search should retain their original counts while
ERROR and WARN are overridden. The duplicated WARN row and the explanation should be
reviewed for accuracy.

docs/user/ppl/cmd/appendcol.md [94-104]

 ```text
 fetched rows / total rows = 4/4
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The result table shows `WARN` duplicated with the same `agg=4`, which is inconsistent with the described `override=true` behavior. The table should show all four severity levels (`DEBUG`, `ERROR`, `INFO`, `WARN`) with the subsearch overriding only `ERROR` and `WARN` values.


</details></details></td><td align=center>Low

</td></tr></tr></tbody></table>

</details>
<details><summary>Suggestions up to commit d0b2a08</summary>
<br>

</details>
<details><summary>Suggestions up to commit cc17d7e</summary>
<br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=3>Possible issue</td>
<td>



<details><summary>Fix duplicate queries with missing parameter</summary>

___


**Example 3 contains two identical PPL queries that are supposed to demonstrate <br>different behaviors (<code>keepempty=true</code> vs. default). The first query is missing <br><code>keepempty=true</code> and both queries produce the same result, making the distinction <br>between the two behaviors unclear. The first query should include <code>keepempty=true</code> to <br>demonstrate that null values are retained.**

[docs/user/ppl/cmd/dedup.md [86-126]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R126)

```diff
 The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

-fetched rows / total rows = 2/2
+fetched rows / total rows = 3/3
+---------------------------+
| instrumentationScope.name |
|---------------------------|
+| null                      |
| opentelemetry-java        |
| opentelemetry-python      |
+---------------------------+

The following query deduplicates while ignoring documents with empty values in the specified field:

source=otellogs
| dedup instrumentationScope.name
| fields instrumentationScope.name
| sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The two PPL queries in Example 3 are identical, but they should demonstrate different behaviors. The first query is missing `keepempty=true`, which would show null values being retained, making the documentation misleading and incorrect.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in result table</summary>

___


**The result table for Example 3 (override parameter) is missing the <code>DEBUG</code> row and <br>shows <code>FATAL</code> duplicated, which appears inconsistent with the described behavior. The <br>main search produces 5 severity levels (DEBUG, ERROR, FATAL, INFO, WARN), but the <br>result only shows 5 rows with FATAL appearing twice and DEBUG missing. This is <br>likely a documentation error that could confuse readers about how <code>override=true</code> <br>works.**

[docs/user/ppl/cmd/appendcol.md [94-104]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R94-R104)

```diff
 ```text
 fetched rows / total rows = 5/5
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 5   | ERROR        |
-| 2   | FATAL        |
 | 2   | FATAL        |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The result table shows `FATAL` duplicated and `DEBUG` missing, which appears inconsistent with the described `override=true` behavior. However, this could be intentional based on how the subsearch aligns rows positionally, so the actual correctness depends on the implementation behavior.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Fix inaccurate total row count in result</summary>

___


**The new data contains two documents matching "Payment failed" (one ERROR and one <br>FATAL), but the query claims <code>fetched rows / total rows = 1/1</code>. The <code>head 1</code> limits <br>output to one row, but the total count should reflect all matching rows (2/2). <br>Verify the expected total count in the result block is accurate.**

[docs/user/ppl/cmd/search.md [178]](https://github.com/opensearch-project/sql/pull/5303/files#diff-dd984cb777515ecc5ea6aa39ef6df5a798c5eb668370ba32279b00f311be3d4cR178-R178)

```diff
 search "Payment failed" source=otellogs
 | sort `resource.attributes.service.name` | fields body | head 1
+```
 
+The query returns the following results:
+
+```text
+fetched rows / total rows = 1/2
++---------------------------------------------------------------------+
+| body                                                                |
+|---------------------------------------------------------------------|
+| Payment failed: connection timeout to payment gateway after 30000ms |
++---------------------------------------------------------------------+
+
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about the fetched rows / total rows = 1/1 count when head 1 is used, but the improved_code changes the format to 1/2 which may or may not be accurate depending on how the engine reports totals with head. The claim about two matching documents needs verification against the actual data.

Low
General
Remove duplicate documentation examples

Examples 1 and 2 are identical in both query and output, which provides no
additional value to the reader. Replace Example 2 with a distinct use case, such as
replacing a severity text value or normalizing a different field, to demonstrate the
command's versatility.

docs/user/ppl/cmd/replace.md [24-78]

-## Example 1: Normalize service names for a compact dashboard
+## Example 2: Normalize severity text labels
 
-The following query shortens long service names for a compact dashboard view:
-...
-## Example 2: Replace service names for display
+The following query replaces verbose severity labels with shorter codes for compact display:
 
-The following query shortens service names for a compact dashboard view:
-...
-| replace 'payment-service' with 'payment' in `resource.attributes.service.name`
-| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name`
+```ppl
+source=otellogs
+| where severityText IN ('ERROR', 'FATAL')
+| replace 'ERROR' with 'ERR' in severityText
+| replace 'FATAL' with 'FTL' in severityText
 | sort severityNumber, `resource.attributes.service.name`
 | fields severityText, `resource.attributes.service.name`
 | head 4
+```
Suggestion importance[1-10]: 7

__

Why: Examples 1 and 2 in replace.md are completely identical in both query and output, which is a clear documentation error that provides no value. This is a valid and important observation that should be fixed to improve documentation quality.

Medium
Restore removed syntax reference sections

The PR removes the syntax reference sections ("Optional choices", "Repetition") that
explain PPL syntax conventions, replacing them entirely with examples. These
reference sections are important for users learning PPL syntax and should be
retained in a syntax reference document. The examples are a good addition but should
supplement, not replace, the syntax documentation.

docs/user/ppl/cmd/syntax.md [52-60]

+### Optional choices
+
+Optional choices between alternatives are shown in square brackets with pipe separators (`[option1 | option2]`). You can choose one of the options or omit them entirely.
+
+**Example**: `[asc | desc]` means you can specify `asc`, `desc`, or neither.
+
+### Repetition
+
+An ellipsis (`...`) indicates that the preceding element can be repeated multiple times.
+
+**Examples**:
+- `<field>...` means one or more fields without commas: `field1 field2 field3`
+- `<field>, ...` means comma-separated repetition: `field1, field2, field3`
+
 ## Example 1: Fetch data from an index with a filter
 
 The following query retrieves all error logs from the `otellogs` index. The filter is specified inline with the `source` command:
 
 ```ppl
 source=otellogs severityText = 'ERROR'
 | fields severityText, `resource.attributes.service.name`
 | sort `resource.attributes.service.name`
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The PR removes important syntax reference sections ("Optional choices" and "Repetition") from `syntax.md`, which are valuable for users learning PPL conventions. Restoring these sections alongside the new examples would make the documentation more complete and useful.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Add missing query results block</summary>

___


**The subsection "Matching with a prefix pattern" shows a query but is missing its <br>expected output block. Every other example in the file includes a results block, and <br>this omission leaves the documentation incomplete and inconsistent.**

[docs/user/ppl/cmd/where.md [104-113]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R104-R113)

```diff
 ### Matching with a prefix pattern
 
 The following query uses a percent sign (`%`) to find all services in the `cart` domain, useful when investigating issues across a team's microservices:
 
 ```ppl
 source=otellogs
 | where LIKE(`resource.attributes.service.name`, 'cart-%')
 | sort severityNumber
 | fields severityText, `resource.attributes.service.name`, body

+The query returns the following results:
+
+text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+----------------------------------------------------------------------------------------------+ +| severityText | resource.attributes.service.name | body | +|--------------+----------------------------------+----------------------------------------------------------------------------------------------| +| INFO | cart-service | Order #1234 placed successfully by user U100 | +| INFO | cart-service | Kafka consumer group rebalanced: 4 partitions assigned to consumer cart-consumer-01 | +| ERROR | cart-service | Kafka producer delivery failed: message too large for topic order-events (max 1048576 bytes) | ++--------------+----------------------------------+----------------------------------------------------------------------------------------------+ +
+

<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The "Matching with a prefix pattern" subsection is missing its expected output block, making it inconsistent with all other examples in the file. However, the suggested output values cannot be fully verified from the PR diff alone, so the score is moderate.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Restore wildcard support documentation in parameters</summary>

___


**The PR removed the documentation about wildcard support in the <code>rename</code> command <br>parameters, but the syntax and behavior notes about wildcards were part of the <br>original documentation. If wildcard renaming is still a supported feature, removing <br>this information from the parameters table could mislead users into thinking it is <br>not supported.**

[docs/user/ppl/cmd/rename.md [10-22]](https://github.com/opensearch-project/sql/pull/5303/files#diff-12b43dbb9b9164dd4ac4501f6cf49f35d91cc51da8713b16d418bfa9659f0aacR10-R22)

```diff
 The `rename` command has the following syntax:
 
 ```syntax
 rename <source-field> AS <target-field> ["," <source-field> AS <target-field>]...

Parameters

Parameter Required/Optional Description
- <source-field> Required
- <target-field> Required
+ <source-field> Required
+ <target-field> Required
<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The PR removed wildcard support documentation from the `rename` command parameters table. If wildcard renaming is still supported, this omission could mislead users. However, the PR may have intentionally removed this to simplify the documentation or because the feature is no longer supported.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Clarify transposed row count in description</summary>

___


**The result table for Example 1 shows 5 data columns (row 1 through row 5) but the <br>description says "fetched rows / total rows = 2/2", which refers to 2 rows being <br>transposed (log_count and severityText). However, the data has 5 severity levels <br>(DEBUG, ERROR, FATAL, INFO, WARN), so the transpose should produce 5 columns. The <br>header "fetched rows / total rows = 2/2" is misleading — it should reflect the <br>number of transposed output rows (2 field rows), but the column count (5) is <br>correct. The description text should clarify this is transposing 2 fields across 5 <br>rows of data, not 2 rows total.**

[docs/user/ppl/cmd/transpose.md [29-36]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ec1fe93388c8d21c3152db80ad223e173eeb0bfb9685a46b9d3ea389f31748b0R29-R36)

```diff
+```text
+fetched rows / total rows = 2/2
++--------------+-------+-------+-------+-------+-------+
+| column       | row 1 | row 2 | row 3 | row 4 | row 5 |
+|--------------+-------+-------+-------+-------+-------|
+| log_count    | 3     | 5     | 2     | 6     | 4     |
+| severityText | DEBUG | ERROR | FATAL | INFO  | WARN  |
++--------------+-------+-------+-------+-------+-------+
+```
 
-
Suggestion importance[1-10]: 2

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. The suggestion only asks to clarify the description text but doesn't provide a concrete fix in the improved code.

Low

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0e3a6a.

PathLineSeverityDescription
docs/user/ppl/cmd/stats.md18lowQuery result output is embedded directly inside section headings and aggregation function bullet points (e.g., '## fetched rows / total rows = 1/1 ... +arameters' and 'VAR_SAMfetched rows...+--...+arameter'). This corrupted content is anomalous and inconsistent with normal editing behavior, though it appears to be an accidental copy-paste error rather than deliberate injection.
docs/user/ppl/cmd/head.md15lowQuery result tables are embedded inside the '## Parameters' section heading and the parameter table header, producing malformed markdown. The same pattern as stats.md — anomalous but likely an accidental copy-paste error during the documentation rewrite.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Persistent review updated to latest commit d0b2a08

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Persistent review updated to latest commit f0e3a6a

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Persistent review updated to latest commit c051fd0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant