Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
9f5be2e
build: Update Makefile to pin Python version
disafronov May 5, 2026
345eff8
build: Refactor test command to include coverage by default
disafronov May 5, 2026
ab7c5f9
build: Update Makefile to remove formatting step from all target
disafronov May 5, 2026
7fe404b
chore: Update pre-commit configuration
disafronov May 5, 2026
70bda31
build: Update coverage configuration
disafronov May 5, 2026
7931024
test: Add test for cache hit on missing file in schema loader
disafronov May 5, 2026
455d25f
refactor: Optimize schema applier and loader
disafronov May 5, 2026
cfa325c
refactor: schema applier to improve dict cleaning
disafronov May 5, 2026
8dd8b8d
refactor: error handling to use dictionaries instead of JSON strings
disafronov May 5, 2026
6b7351d
refactor: Remove unused _is_leaf_node function and tests
disafronov May 5, 2026
ea14a0e
ci: Update python versions in lint and test workflow
disafronov May 5, 2026
e1a11d6
build: Bump python version to 3.11
disafronov May 5, 2026
5d62f7c
refactor: Remove inspect fallback for Python < 3.11
disafronov May 5, 2026
790eda0
ci: Update dependencies to latest versions
disafronov May 5, 2026
345c602
chore: Update pre-commit config to exclude helpers.py
disafronov May 5, 2026
b640d4b
build: Update pre-commit config to fix test exclusion
disafronov May 5, 2026
5be76d7
build: Add markdownlint ignore file for changelog
disafronov May 5, 2026
5878cc0
test: Remove obsolete tests for Python 3.11 and earlier
disafronov May 5, 2026
382baff
refactor: Remove future annotations from modules
disafronov May 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/lint_and_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
contents: read
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]
python-version: ["3.11", "3.12", "3.13", "3.14"]
steps:
- name: Checkout
uses: actions/checkout@v6
Expand Down
1 change: 1 addition & 0 deletions .markdownlintignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CHANGELOG.md
13 changes: 6 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,12 @@ repos:
args: [--allow-multiple-documents]
- id: check-toml
stages: [pre-commit]
- id: check-case-conflict
stages: [pre-commit]
- id: name-tests-test
stages: [pre-commit]
args: [--pytest-test-first]
exclude: ^(.*/)?tests/helpers\.py$

- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.46.0
Expand All @@ -46,13 +52,6 @@ repos:

- repo: local
hooks:
- id: make-format
name: make format (pre-commit)
stages: [pre-commit]
language: system
entry: make format
pass_filenames: false

- id: make-lint
name: make lint (pre-commit)
stages: [pre-commit]
Expand Down
14 changes: 7 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Python version is pinned via `.python-version` (used by uv and CI).
PYTHON_VERSION := $(shell tr -d '[:space:]' < .python-version)

# Variables
PYTEST_CMD = uv run python -m pytest -v
COVERAGE_OPTS = --cov --cov-report=term-missing --cov-report=html
Expand All @@ -15,7 +18,8 @@ help: ## Show this help message
# Development
install: ## Install dependencies
@echo "Installing dependencies..."
uv sync
uv python install $(PYTHON_VERSION)
uv sync --python $(PYTHON_VERSION)
@echo "Installing pre-commit hooks..."
uv run pre-commit install

Expand All @@ -33,16 +37,12 @@ dead-code: ## Check for dead code using vulture
uv run vulture

# Testing
test: ## Run tests
@echo "Running tests..."
$(PYTEST_CMD)

test-coverage: ## Run tests with coverage
test: ## Run tests with coverage
@echo "Running tests with coverage..."
$(PYTEST_CMD) $(COVERAGE_OPTS)

# Combined operations
all: format lint test dead-code ## Run format, lint, test, and dead-code check
all: lint test dead-code ## Run lint, test, and dead-code check
@echo "All checks completed successfully!"

# Maintenance
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ name = "logging-objects-with-schema"
version = "0.4.1"
description = "Proxy logging wrapper that validates extra fields against a JSON schema."
readme = "README.md"
requires-python = ">=3.10"
requires-python = ">=3.11"
dependencies = [ ]
license = "Apache-2.0"
license-files = [ "LICENSE" ]
Expand All @@ -20,7 +20,6 @@ keywords = [
]
classifiers = [
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
Expand Down Expand Up @@ -102,6 +101,7 @@ source = [ "src" ]
branch = true

[tool.coverage.report]
fail_under = 100
exclude_lines = [
"def __repr__",
"if self\\.debug",
Expand Down
2 changes: 0 additions & 2 deletions src/logging_objects_with_schema/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
according to an application-defined JSON schema.
"""

from __future__ import annotations

from .schema_logger import SchemaLogger

__all__ = [
Expand Down
21 changes: 9 additions & 12 deletions src/logging_objects_with_schema/errors.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
"""Custom exception types used by logging_objects_with_schema."""

from __future__ import annotations

import json
from dataclasses import dataclass
from typing import Any


@dataclass
Expand Down Expand Up @@ -60,15 +60,12 @@ class _DataProblem:
- Redundant fields (fields not defined in the schema)

Attributes:
message: JSON string containing structured error information. The message
is always a valid JSON object with the following structure:
``{"field": "...", "error": "...", "value": "..."}``
All values are serialized via ``repr()`` for safety and consistency.
Examples:
- ``{"field": "'user_id'", "error": "'has type str, expected int'", "value": "'abc-123'"}``
- ``{"field": "'request_id'", "error": "'is None'", "value": "None"}``
- ``{"field": "'tags'", "error": "'is a list but contains elements with types dict; expected all elements to be of type str'", "value": "[{'key': 'color'}]"}`` # noqa: E501
- ``{"field": "'unknown_field'", "error": "'is not defined in schema'", "value": "'some_value'"}``
data: Dict containing structured error information with keys
``field``, ``error``, and ``value`` (all via ``repr()``).
"""

message: str
data: dict[str, Any]

@property
def message(self) -> str:
return json.dumps(self.data)
85 changes: 24 additions & 61 deletions src/logging_objects_with_schema/schema_applier.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,41 +4,32 @@
to user-provided extra fields, used by SchemaLogger.
"""

from __future__ import annotations

import json
from collections import defaultdict
from collections.abc import Mapping, MutableMapping
from typing import Any

from .errors import _DataProblem
from .schema_loader import _CompiledSchema, _SchemaLeaf


def _create_validation_error_json(field: str, error: str, value: Any) -> str:
"""Create JSON string for a single validation error.
def _create_validation_error_dict(field: str, error: str, value: Any) -> dict[str, Any]:
"""Create a dict for a single validation error.

All values are wrapped in repr() before JSON serialization. This ensures:
- Any value type can be safely serialized (even non-JSON-serializable types)
- The error message always contains a valid Python representation of the value
- Security: prevents issues with special characters or control sequences
- Consistency: all error messages have the same format regardless of value type
All values are wrapped in repr() to ensure any type is representable
and special characters cannot cause issues downstream.

Args:
field: Field name that caused the validation error.
error: Error description.
value: Invalid value that caused the error.

Returns:
JSON string with field, error, and value (all via repr() for safety).
Dict with field, error, and value (all via repr() for safety).
"""
return json.dumps(
{
"field": repr(field),
"error": repr(error),
"value": repr(value),
}
)
return {
"field": repr(field),
"error": repr(error),
"value": repr(value),
}


def _validate_list_value(
Expand All @@ -62,7 +53,7 @@ def _validate_list_value(
"""
if item_expected_type is None:
error_msg = "is a list but has no item type configured"
return _DataProblem(_create_validation_error_json(source, error_msg, value))
return _DataProblem(_create_validation_error_dict(source, error_msg, value))

if len(value) == 0:
# Empty lists are always valid
Expand All @@ -83,7 +74,7 @@ def _validate_list_value(
f"expected all elements to be of type "
f"{item_expected_type.__name__}"
)
return _DataProblem(_create_validation_error_json(source, error_msg, value))
return _DataProblem(_create_validation_error_dict(source, error_msg, value))

return None

Expand Down Expand Up @@ -152,7 +143,7 @@ def _validate_and_apply_leaf(
f"expected {leaf.expected_type.__name__}"
)
problems.append(
_DataProblem(_create_validation_error_json(source, error_msg, value))
_DataProblem(_create_validation_error_dict(source, error_msg, value))
)
return

Expand Down Expand Up @@ -197,8 +188,12 @@ def _strip_empty(node: Any) -> Any:
The cleaned structure with empty dicts and None values removed.
"""
if isinstance(node, dict):
cleaned = {k: _strip_empty(v) for k, v in node.items()}
return {k: v for k, v in cleaned.items() if v != {} and v is not None}
result: dict[str, Any] = {}
for k, v in node.items():
v = _strip_empty(v)
if v != {} and v is not None:
result[k] = v
return result
return node


Expand Down Expand Up @@ -274,63 +269,31 @@ def _apply_schema_internal(
extra: dict[str, Any] = {}
problems: list[_DataProblem] = []

# Group leaves by source field name. This is necessary because a single source
# can be referenced by multiple leaves (allowing the same value to appear in
# different locations in the output structure). Grouping allows us to process
# all leaves for a given source together, which is more efficient and allows
# us to validate the value once per source (e.g., checking for None) rather
# than once per leaf.
source_to_leaves: dict[str, list[_SchemaLeaf]] = defaultdict(list)
for leaf in compiled.leaves:
source_to_leaves[leaf.source].append(leaf)

used_sources = set(source_to_leaves.keys())

# Process each source that appears in the schema. If a source is missing from
# extra_values, we silently skip it (this is normal - not all sources need to
# be present in every log call). We only validate and apply sources that are
# actually provided.
for source, leaves in source_to_leaves.items():
for source, leaves in compiled.source_to_leaves.items():
if source not in extra_values:
# Source not provided - this is normal, not an error. Skip it.
continue

value = extra_values[source]

# Check for None values explicitly. None is never allowed for any type,
# so we check it once per source (not once per leaf) before attempting
# type-specific validation. This avoids redundant checks when a source
# is used by multiple leaves.
if value is None:
error_msg = "is None"
problems.append(
_DataProblem(_create_validation_error_json(source, error_msg, None))
_DataProblem(_create_validation_error_dict(source, "is None", None))
)
continue

# Validate the value against each leaf that references this source.
# Each leaf validates independently, so a value might pass validation
# for some leaves (where type matches) but fail for others (where type
# doesn't match). The value is written only to locations where validation
# succeeds.
for leaf in leaves:
_validate_and_apply_leaf(leaf, value, source, extra, problems)

# Report redundant fields: any keys in extra_values that are not referenced
# by any schema leaf. These are fields that the user provided but which are
# not defined in the schema, so they cannot be included in the log output.
# Optimization: if schema is empty (no used_sources), all fields are redundant,
# so we can skip the membership check for each key.
redundant_keys = (
extra_values.keys()
if not used_sources
else (key for key in extra_values.keys() if key not in used_sources)
if not compiled.known_sources
else (key for key in extra_values.keys() if key not in compiled.known_sources)
)
for key in redundant_keys:
error_msg = "is not defined in schema"
problems.append(
_DataProblem(
_create_validation_error_json(key, error_msg, extra_values[key])
_create_validation_error_dict(key, error_msg, extra_values[key])
)
)

Expand Down
53 changes: 14 additions & 39 deletions src/logging_objects_with_schema/schema_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,13 @@
emitted and which Python types they must have.
"""

from __future__ import annotations

import functools
import json
import logging
import os
import threading
from collections.abc import Iterable, Mapping, MutableMapping
from dataclasses import dataclass
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Literal

Expand Down Expand Up @@ -58,6 +56,19 @@ class _CompiledSchema:
"""

leaves: list[_SchemaLeaf]
source_to_leaves: dict[str, list[_SchemaLeaf]] = field(
default_factory=dict, init=False, repr=False, compare=False
)
known_sources: frozenset[str] = field(
default_factory=frozenset, init=False, repr=False, compare=False
)

def __post_init__(self) -> None:
source_map: dict[str, list[_SchemaLeaf]] = {}
for leaf in self.leaves:
source_map.setdefault(leaf.source, []).append(leaf)
self.source_to_leaves = source_map
self.known_sources = frozenset(source_map)

@property
def is_empty(self) -> bool:
Expand Down Expand Up @@ -475,42 +486,6 @@ def _determine_node_type_and_validate(
return ("inner", True)


def _is_leaf_node(value_dict: dict[str, Any]) -> bool:
"""Check if a schema node is a leaf node.

A leaf node is identified by having at least one of 'type' or 'source' fields
that is a string. If 'type' or 'source' are themselves objects (not strings),
they are child nodes, not leaf properties.

Inner nodes have either no 'type'/'source' fields, or have these fields as
objects (child nodes) rather than strings.

We use `.get()` with `is not None` check instead of `in` operator because:
- A field might be present but have a None value (which indicates an error)
- We need to distinguish between "field not present" (inner node) and
"field present but None" (invalid leaf node that will be caught later)

Args:
value_dict: Dictionary containing node data.

Returns:
True if the node is a leaf, False if it's an inner node.
"""
type_value = value_dict.get("type")
source_value = value_dict.get("source")

# If either field is an object (Mapping), this is an inner node, not a leaf
if isinstance(type_value, Mapping) or isinstance(source_value, Mapping):
return False

# Check if at least one field is present and is a string
if isinstance(type_value, str) or isinstance(source_value, str):
return True

# Neither field is present as a string - this is an inner node
return False


def _validate_and_create_leaf(
value_dict: dict[str, Any],
path: tuple[str, ...],
Expand Down
Loading
Loading