Skip to content

MDEV-39412: parse error reading tabs in ranges#5049

Open
bsrikanth-mariadb wants to merge 1 commit intobb-12.3-MDEV-39368-test-replayfrom
bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges
Open

MDEV-39412: parse error reading tabs in ranges#5049
bsrikanth-mariadb wants to merge 1 commit intobb-12.3-MDEV-39368-test-replayfrom
bb-12.3-mdev-39412-parse-error-reading-tabs-in-ranges

Conversation

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor

Note:
while reading from information_schema.optimizer_context one level of unescaping is already done i.e. (\t becomes \t or \\t becomes \t)

w.r.t the MDEV, there are 2 problems: -

When reading from the sql script file, json parser is not able to parse the range value in json_read_value() from json_lib.c "ranges": [
"(b\t\t\t\t\t\t) <= (b) <= (b???????)"
],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context, we get Impossible WHERE noticed after reading const tables

There is unescaping call being made in read_string() from sql_json_lib.cc while parsing of the context. With this \t was becoming \t. However, print_range() from opt_range.cc already does escaping of the values. The value "b\t\t\t" was in fact produced as "\b\t\t\t". Later, we try to compare range values from the query and the context.

Here a mismatch a found because, in one case there was escaping, and in the other case escaping got removed.

Solutions

For Problem 1. have escaping for ranges.
This should be done while dumping range values into the context.

For Problem 2. Remove unscaping call in read_string().

Note:
while reading from information_schema.optimizer_context one level of unescaping
is already done i.e. (\\t becomes \t or \\\\t becomes \\t)

w.r.t the MDEV, there are 2 problems: -

1.
When reading from the sql script file, json parser is not able to parse
the range value in json_read_value() from json_lib.c
"ranges": [
            "(b\t\t\t\t\t\t) <= (b) <= (b???????)"
          ],
mainly the \t\t stuff, and hence a warning.
It also stops loading the context into memory.
Since, a new table is created with empty data, and without context,
we get Impossible WHERE noticed after reading const tables

2.
There is unescaping call being made in read_string() from sql_json_lib.cc
while parsing of the context. With this \\t was becoming \t.
However, print_range() from opt_range.cc already does escaping of the values.
The value "b\t\t\t" was in fact produced as "\b\\t\\t\\t".
Later, we try to compare range values from the query and the context.

Here a mismatch a found because, in one case there was escaping,
and in the other case escaping got removed.

Solutions
=========
For Problem 1. have escaping for ranges.
This should be done while dumping range values into the context.

For Problem 2. Remove unscaping call in read_string().
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-39412 by implementing JSON escaping for range information in optimizer context dumps and adding relevant regression tests. Feedback highlights that removing the unescaping logic from the JSON reader breaks standard behavior and should be reverted. Additionally, an optimization was suggested to move a buffer allocation outside a loop in the context dumping logic to improve efficiency.

Comment thread sql/sql_json_lib.cc
}

value= strdup_root(mem_root, val_buf.c_ptr_safe());
value= strmake_root(mem_root, (const char *) je->value, je->value_len);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Removing the unescaping logic from read_string breaks the standard behavior of a JSON parser. JSON strings are expected to be unescaped to correctly handle sequences like \n, \t, \", and \\.

The issue described in the PR (where \\t becomes \t but should remain \\t) is actually addressed by the double-escaping introduced in dump_mrr_info_calls. With double-escaping, the JSON will contain \\\\t, which after SQL unescaping becomes \\t, and then the JSON reader's unescaping will turn it into \t (the literal backslash and 't').

By removing unescaping here, you are making the JSON reader return the raw, escaped content of the JSON string, which is non-standard and will break other potential users of this utility function. Please restore the unescaping logic.

  StringBuffer<128> val_buf;
  if (json_unescape_to_string((const char *) je->value, je->value_len,
                              &val_buf))
  {
    err_buf->append(STRING_WITH_LEN("un-escaping error of "));
    err_buf->append(read_elem_key, strlen(read_elem_key));
    err_buf->append(STRING_WITH_LEN(" element"));
    return true;
  }

  value= strdup_root(mem_root, val_buf.c_ptr_safe());

Comment on lines +259 to +270
List_iterator rc_li(irc->range_list);
Json_writer_array ranges_wrapper(ctx_writer, "ranges");
while (const char *range_str= rc_li++)
{
Json_writer_array ranges_wrapper(ctx_writer, "ranges");
List_iterator rc_li(irc->range_list);
while (const char *range_str= rc_li++)
ranges_wrapper.add(range_str, strlen(range_str));
const String range_info(range_str, strlen(range_str),
system_charset_info);
StringBuffer<128> escaped_range_info;
json_escape_to_string(&range_info, &escaped_range_info);
ranges_wrapper.add(escaped_range_info.c_ptr_safe(),
escaped_range_info.length());
}
ranges_wrapper.end();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation re-allocates the StringBuffer and its internal storage on every iteration of the loop. It is more efficient to move the buffer declaration outside the loop and reuse it by clearing its length in each iteration. Additionally, using a scoped block for the Json_writer_array is a cleaner way to ensure the array is closed correctly via its destructor, rather than calling end() explicitly.

    {
      Json_writer_array ranges_wrapper(ctx_writer, "ranges");
      List_iterator rc_li(irc->range_list);
      StringBuffer<128> escaped_range_info;
      while (const char *range_str= rc_li++)
      {
        const String range_info(range_str, strlen(range_str),
                                system_charset_info);
        escaped_range_info.length(0);
        json_escape_to_string(&range_info, &escaped_range_info);
        ranges_wrapper.add(escaped_range_info.c_ptr_safe(),
                           escaped_range_info.length());
      }
    }

Comment on lines +63 to +64
rc= json_read_object(&je, array, &err_buf) ||
strcmp(parsed_esc_str, esc_str_val);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test change validates that the reader does NOT unescape, which is incorrect for a JSON library. For example, a JSON string "a\\bc" should be unescaped to a\bc (where \b is a backspace character if following standard JSON rules, or literal backslash and 'b' if escaped as \\b). Once read_string is restored to its correct behavior, this test should be updated to verify that unescaping is handled correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant