Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 1 addition & 5 deletions .releaserc.cjs
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,7 @@ module.exports = {
["@semantic-release/changelog", {}],
["@semantic-release/git", {
"assets": ["CHANGELOG.md", "pyproject.toml", "uv.lock"],
"message": "chore(release): ${nextRelease.version}\n\n${nextRelease.notes}\n\nSigned-off-by: " +
(process.env.GIT_AUTHOR_NAME || process.env.GIT_COMMITTER_NAME || "Release Bot") +
" <" +
(process.env.GIT_AUTHOR_EMAIL || process.env.GIT_COMMITTER_EMAIL || "noreply@github.com") +
">"
"message": "chore(release): ${nextRelease.version}\n\n${nextRelease.notes}\n\nSigned-off-by: semantic-release-bot <semantic-release-bot@martynus.net>"
}],
["@semantic-release/github", {}]
]
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
## [0.2.0-rc.1](https://github.com/disafronov/python-logging-objects-with-schema/compare/v0.1.4...v0.2.0-rc.1) (2025-12-09)

### Features

* **schema_logger:** add support for custom forbidden keys in SchemaLogger ([1d678f5](https://github.com/disafronov/python-logging-objects-with-schema/commit/1d678f5ea8332e4afab2b96ed23bf61a928ad7cd))

## [0.1.4](https://github.com/disafronov/python-logging-objects-with-schema/compare/v0.1.3...v0.1.4) (2025-12-08)

### Bug Fixes
Expand Down
75 changes: 75 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,3 +250,78 @@ Example:

With `extra={"request_id": "abc-123"}`, the value appears in both
`ServicePayload.RequestID` and `ServicePayload.Metadata.ID`.

## Inheritance and custom forbidden root keys

`SchemaLogger` supports inheritance, allowing subclasses to add additional
forbidden root keys for schema validation. This is useful when you need to
prevent certain root keys from being used in your schema beyond the builtin
`logging.LogRecord` attributes.

### Basic inheritance

Each subclass can pass the `forbidden_keys` parameter to the parent's
`__init__()` method. The builtin set of forbidden keys (standard `logging.LogRecord`
attributes) is always present and cannot be replaced - additional keys are
merged with the builtin set.

Example:

```python
from logging_objects_with_schema import SchemaLogger
import logging

class MyLogger(SchemaLogger):
def __init__(self, name: str, level: int = logging.NOTSET) -> None:
# Add custom forbidden keys
super().__init__(name, level, forbidden_keys={"custom_forbidden_key"})
```

### Multi-level inheritance

When creating a hierarchy of loggers, each subclass can pass `forbidden_keys`
from its own subclasses to the parent, merging them with its own set. The
library does not automatically propagate keys up the inheritance chain - each
subclass must implement this logic itself.

Example:

```python
from logging_objects_with_schema import SchemaLogger
import logging

class ParentLogger(SchemaLogger):
def __init__(
self, name: str, level: int = logging.NOTSET, forbidden_keys: set[str] | None = None
) -> None:
# Merge parent's keys with keys from child
parent_keys = {"parent_forbidden_key"}
if forbidden_keys:
parent_keys = parent_keys | forbidden_keys
super().__init__(name, level, forbidden_keys=parent_keys)

class ChildLogger(ParentLogger):
def __init__(self, name: str, level: int = logging.NOTSET) -> None:
# Pass child's keys to parent, which will merge them
super().__init__(name, level, forbidden_keys={"child_forbidden_key"})
```

In this example, the final set of forbidden keys will be:

- Builtin `logging.LogRecord` attributes (always present)
- `parent_forbidden_key` (from `ParentLogger`)
- `child_forbidden_key` (from `ChildLogger`)

All keys are merged together - they are not replaced, only supplemented.

### Important notes

- The builtin set of forbidden keys (standard `logging.LogRecord` attributes)
is always present and cannot be replaced or removed
- Additional forbidden keys are merged with the builtin set, not replaced
- Each subclass must implement the logic to pass `forbidden_keys` to its parent
if it wants to propagate keys from its own subclasses
- The `forbidden_keys` parameter is optional - if not provided, only builtin
keys are used, maintaining 100% backward compatibility
- `None` and empty `set()` are semantically equivalent for `forbidden_keys` -
both mean "no additional forbidden keys" and produce the same result
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "uv_build"

[project]
name = "logging-objects-with-schema"
version = "0.1.4"
version = "0.2.0rc1"
description = "Proxy logging wrapper that validates extra fields against a JSON schema."
readme = "README.md"
requires-python = ">=3.10"
Expand Down
96 changes: 69 additions & 27 deletions src/logging_objects_with_schema/schema_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,12 @@ def _create_empty_compiled_schema_with_problems(
# cache stores the result of compiling that file. Both are thread-safe and
# use double-checked locking to avoid race conditions.

# Compiled schema cache: Key is absolute schema_path, Value is tuple of
# (_CompiledSchema, list[_SchemaProblem]). This cache stores both successful
# compilations and failures (with problems list).
_SCHEMA_CACHE: dict[Path, tuple[_CompiledSchema, list[_SchemaProblem]]] = {}
# Compiled schema cache: Key is tuple of (absolute schema_path, frozenset of
# forbidden_keys), Value is tuple of (_CompiledSchema, list[_SchemaProblem]).
# This cache stores both successful compilations and failures (with problems list).
_SCHEMA_CACHE: dict[
tuple[Path, frozenset[str]], tuple[_CompiledSchema, list[_SchemaProblem]]
] = {}

_cache_lock = threading.RLock()

Expand Down Expand Up @@ -625,11 +627,30 @@ def _get_builtin_logrecord_attributes() -> set[str]:


def _check_root_conflicts(
schema_dict: Mapping[str, Any], problems: list[_SchemaProblem]
schema_dict: Mapping[str, Any],
problems: list[_SchemaProblem],
forbidden_keys: set[str] | None = None,
) -> None:
"""Check schema root keys for conflicts with reserved logging fields."""
"""Check schema root keys for conflicts with reserved logging fields.

Args:
schema_dict: The schema dictionary to check.
problems: List to collect validation problems.
forbidden_keys: Additional forbidden root keys to check against.
These keys are merged with builtin LogRecord attributes.
Builtin keys cannot be replaced, only supplemented.
Note: None and empty set() are semantically equivalent - both
mean "no additional forbidden keys" and produce the same result.
"""
# Builtin keys always present, cannot be replaced
forbidden_root_keys = _get_builtin_logrecord_attributes()
# Merge with additional forbidden keys if provided.
# We use `if forbidden_keys:` instead of `if forbidden_keys is not None:` because
# both None and empty set() semantically mean "no additional forbidden keys".
# This check treats them equivalently: None is falsy, and empty set() is also
# falsy, so both cases skip the merge operation, which is the correct behavior.
if forbidden_keys:
forbidden_root_keys = forbidden_root_keys | forbidden_keys

for key in schema_dict.keys():
if key in forbidden_root_keys:
Expand All @@ -640,7 +661,9 @@ def _check_root_conflicts(
)


def _compile_schema_internal() -> tuple[_CompiledSchema, list[_SchemaProblem]]:
def _compile_schema_internal(
forbidden_keys: set[str] | None = None,
) -> tuple[_CompiledSchema, list[_SchemaProblem]]:
"""Compile JSON schema into ``_CompiledSchema`` and collect all problems.

The function loads the raw JSON schema, validates its structure, checks
Expand All @@ -649,13 +672,22 @@ def _compile_schema_internal() -> tuple[_CompiledSchema, list[_SchemaProblem]]:
discovered during this process are reported as :class:`_SchemaProblem`
instances.

Results are cached process-wide: the cache key is the absolute schema
file path and the value is a tuple ``(_CompiledSchema, list[_SchemaProblem])``.
Once a schema for a given path has been observed (including the cases when
it is missing or invalid), subsequent calls always return the cached result
without re-reading or re-compiling the schema. To pick up on-disk changes
to the schema, the application must restart the process. See the README
section \"Schema caching and thread safety\" for more details.
Results are cached process-wide: the cache key is a tuple of the absolute
schema file path and the set of additional forbidden keys. The value is a
tuple ``(_CompiledSchema, list[_SchemaProblem])``. Once a schema for a
given path and forbidden keys set has been observed (including the cases
when it is missing or invalid), subsequent calls always return the cached
result without re-reading or re-compiling the schema. To pick up on-disk
changes to the schema, the application must restart the process. See the
README section \"Schema caching and thread safety\" for more details.

Args:
forbidden_keys: Additional forbidden root keys to check against.
These keys are merged with builtin LogRecord attributes.
Builtin keys cannot be replaced, only supplemented.
Note: None and empty set() are semantically equivalent - both
mean "no additional forbidden keys" and produce the same result.
They also produce the same cache key, so cached results are shared.

This function never raises exceptions. It always returns the best-effort
compiled schema together with a list of problems detected during processing
Expand All @@ -664,17 +696,18 @@ def _compile_schema_internal() -> tuple[_CompiledSchema, list[_SchemaProblem]]:
Performance considerations:
First compilation of a schema involves file I/O, JSON parsing, and tree
traversal. For typical schemas (< 1000 nodes), this takes < 10ms. All
subsequent calls for the same schema path return immediately from cache
(< 0.1ms). The cache is process-wide and persists for the application
lifetime.
subsequent calls for the same schema path and forbidden keys return
immediately from cache (< 0.1ms). The cache is process-wide and persists
for the application lifetime.

Limitations:
- Schema changes on disk are not automatically reloaded; the application
must be restarted to pick up changes
- Very large schemas (> 10,000 nodes) may cause noticeable compilation
overhead on first load, but this is uncommon in practice
- The cache uses absolute file paths as keys, so schema files must be
accessible via the same path throughout the application lifetime
- The cache uses absolute file paths and forbidden keys sets as keys, so
schema files must be accessible via the same path throughout the
application lifetime

Note:
This function is used internally by :class:`SchemaLogger` and by the
Expand All @@ -685,12 +718,21 @@ def _compile_schema_internal() -> tuple[_CompiledSchema, list[_SchemaProblem]]:
Tuple of (_CompiledSchema, list[_SchemaProblem]).
"""
schema_path = _get_schema_path()
# Create cache key that includes forbidden keys.
# We use `frozenset(forbidden_keys or ())` instead of checking for None explicitly
# because both None and empty set() should produce the same cache key (frozenset()).
# This is semantically correct: both None and set() mean "no additional forbidden
# keys", so they should share the same cached compilation result. The `or ()`
# operator converts None to empty tuple, which frozenset() then converts to empty
# frozenset, and empty set() also becomes empty frozenset, ensuring cache key
# consistency.
cache_key = (schema_path, frozenset(forbidden_keys or ()))

# Fast-path: First check (with lock for thread-safety) if we have already attempted
# to compile schema for this path. This provides thread-safe cache access
# in the common case when the schema is already cached.
# to compile schema for this path and forbidden keys set. This provides thread-safe
# cache access in the common case when the schema is already cached.
with _cache_lock:
cached = _SCHEMA_CACHE.get(schema_path)
cached = _SCHEMA_CACHE.get(cache_key)
if cached is not None:
return cached

Expand All @@ -712,16 +754,16 @@ def _compile_schema_internal() -> tuple[_CompiledSchema, list[_SchemaProblem]]:
# thread might have compiled the schema (or handled the same error) while
# we were processing the exception. If so, use the cached result.
with _cache_lock:
cached = _SCHEMA_CACHE.get(schema_path)
cached = _SCHEMA_CACHE.get(cache_key)
if cached is not None:
return cached
_SCHEMA_CACHE[schema_path] = result
_SCHEMA_CACHE[cache_key] = result
return result

# Check root key conflicts before compiling the tree. This allows us to
# catch conflicts early and report them as schema problems. We do this
# before tree compilation to avoid unnecessary work if there are conflicts.
_check_root_conflicts(raw_schema, problems)
_check_root_conflicts(raw_schema, problems, forbidden_keys)

# Compile the schema tree into leaves. Each root key becomes a separate
# tree that we compile recursively. Problems are collected as we go.
Expand All @@ -748,9 +790,9 @@ def _compile_schema_internal() -> tuple[_CompiledSchema, list[_SchemaProblem]]:
# instead of overwriting it with our own (which might be different if the file
# was modified between reads).
with _cache_lock:
cached = _SCHEMA_CACHE.get(schema_path)
cached = _SCHEMA_CACHE.get(cache_key)
if cached is not None:
return cached
_SCHEMA_CACHE[schema_path] = result
_SCHEMA_CACHE[cache_key] = result

return result
17 changes: 15 additions & 2 deletions src/logging_objects_with_schema/schema_logger.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,12 @@ class SchemaLogger(logging.Logger):
>>> logger.info("request processed", extra={"request_id": "abc-123"})
"""

def __init__(self, name: str, level: int = logging.NOTSET) -> None:
def __init__(
self,
name: str,
level: int = logging.NOTSET,
forbidden_keys: set[str] | None = None,
) -> None:
"""Initialise the schema-aware logger.

The schema is compiled once during construction. If any
Expand All @@ -80,12 +85,20 @@ def __init__(self, name: str, level: int = logging.NOTSET) -> None:
Args:
name: Logger name (same as :class:`logging.Logger`).
level: Logger level (same as :class:`logging.Logger`).
forbidden_keys: Additional forbidden root keys to check against.
These keys are merged with builtin LogRecord attributes.
Builtin keys cannot be replaced, only supplemented.
Subclasses can override this method and pass their own
forbidden keys to the parent, merging them with keys from
their own subclasses if needed.
Note: None and empty set() are semantically equivalent - both
mean "no additional forbidden keys" and produce the same result.
"""
# Validate schema before creating the logger instance to avoid
# registering a broken logger in the logging manager cache.
# Schema is compiled and cached first, then problems are checked.
try:
compiled, problems = _compile_schema_internal()
compiled, problems = _compile_schema_internal(forbidden_keys)
except (OSError, ValueError, RuntimeError) as exc:
# Convert system-level exceptions to _SchemaProblem so they can be
# handled the same way as schema validation problems.
Expand Down
Loading