Skip to content

[4.x] Add charset support to PostgreSQLDatabaseManager#1429

Open
p1xel007 wants to merge 6 commits intoarchtechx:masterfrom
p1xel007:postgress-db-creation-update
Open

[4.x] Add charset support to PostgreSQLDatabaseManager#1429
p1xel007 wants to merge 6 commits intoarchtechx:masterfrom
p1xel007:postgress-db-creation-update

Conversation

@p1xel007
Copy link
Copy Markdown

@p1xel007 p1xel007 commented Jan 4, 2026

Problem

The PostgreSQLDatabaseManager creates tenant databases without specifying encoding, unlike MySQLDatabaseManager which respects the connection's charset configuration.

When encoding is not explicitly defined, PostgreSQL uses the server's default encoding (inherited from template0). This causes issues on systems where the default encoding differs from the application's expected encoding:

  • Windows systems often default to WIN1252 encoding
  • Some Linux configurations may use LATIN1 or other regional encodings
  • Docker containers may have different defaults depending on the base image

This mismatch leads to errors like:
SQLSTATE[22P05]: Untranslatable character: 7 ERROR: character with byte sequence 0xe2 0x86 0x92 in encoding "UTF8" has no equivalent in encoding "WIN1252"

Solution

Follow the same pattern as MySQLDatabaseManager by reading the charset value from the database connection configuration and applying them when creating tenant databases.

Changes

  • Read charset value from the connection config
  • Apply ENCODING parameter to the CREATE DATABASE statement

Configuration

Users can now control the encoding via their config/database.php:

'pgsql' => [
    // ...
    'charset' => 'UTF8',
],

Notes

  • Maintains consistency with
  • MySQLDatabaseManager.php
  • implementation
  • template0 is required (already in use) to allow encoding override
  • Backward compatible for users who already have charset configured

Summary by CodeRabbit

  • Bug Fixes
    • Database creation now respects configured character encoding: tenant databases include the configured encoding when set, otherwise they fall back to the server default.
  • Tests
    • Added tests validating tenant databases are created with the expected encoding for both configured charset and unset (null) scenarios.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.86%. Comparing base (60dd522) to head (6b0ff48).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1429      +/-   ##
============================================
- Coverage     86.08%   85.86%   -0.22%     
- Complexity     1154     1164      +10     
============================================
  Files           184      185       +1     
  Lines          3377     3404      +27     
============================================
+ Hits           2907     2923      +16     
- Misses          470      481      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stancl
Copy link
Copy Markdown
Member

stancl commented Feb 9, 2026

The title says charset and collation — are we handling collation here? The diff only shows charset being used.

@p1xel007 p1xel007 changed the title [4.x] Add charset and collation support to PostgreSQLDatabaseManager [4.x] Add charset support to PostgreSQLDatabaseManager Feb 21, 2026
@p1xel007
Copy link
Copy Markdown
Author

p1xel007 commented Feb 21, 2026

The title says charset and collation — are we handling collation here? The diff only shows charset being used.

The first commit did include collation changes, but these were removed in the second commit. The commit title wasn’t updated at the time, but I’ve now corrected it to reflect the changes.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f38534e-a32f-47e0-b7ea-95dcf40cacd7

📥 Commits

Reviewing files that changed from the base of the PR and between d41ac46 and 6b0ff48.

📒 Files selected for processing (1)
  • tests/TenantDatabaseManagerTest.php

📝 Walkthrough

Walkthrough

The PostgreSQL manager's createDatabase now captures the tenant DB name locally, reads charset from the connection config, and conditionally appends an ENCODING='<UPPERCASED_CHARSET>' clause to the CREATE DATABASE SQL when charset is present; tests added to validate this behavior across null and 'utf8' cases.

Changes

Cohort / File(s) Summary
PostgreSQL Database Creation
src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php
createDatabase now uses a local $database, reads charset from the connection config, and appends ENCODING='... (uppercased) to CREATE DATABASE when non-null (previously always used WITH TEMPLATE=template0 without encoding).
Tests — Tenant DB encoding
tests/TenantDatabaseManagerTest.php
Adds parameterized test that configures database.connections.pgsql.charset, creates a tenant DB, queries pg_database/pg_encoding_to_char(encoding), and asserts the new DB's encoding equals the configured uppercased charset or the server default when null.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hop to the SQL with a bright little grin,
I tuck in the charset, an uppercase spin.
New databases wake with the right little tune,
Tenants all snug beneath charset moon. 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding charset support to PostgreSQLDatabaseManager, which is the primary focus of both the implementation and test changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php`:
- Line 14: The code currently forces UTF8 by default when reading the charset
via $this->connection()->getConfig('charset') in PostgreSQLDatabaseManager (the
$charset assignment); remove that hardcoded fallback so the code reads the
configured charset only (letting PostgreSQL use the cluster default when no
charset is provided), and also add explicit guidance in the shipped
assets/config.php explaining the optional "charset" setting and how to set it
when a non-cluster default is required so deployments can override it if needed.
- Around line 14-16: The CREATE/DELETE SQL builds interpolate database and
charset values directly, enabling SQL injection if names/charset are customized;
update PostgreSQLDatabaseManager::createDatabase() to escape identifiers by
doubling any embedded double quotes in the $database name and wrap it in double
quotes, and validate/normalize $charset against an allow-list (or restrict to
safe alphanumerics) before including it; apply the same escaping/validation
pattern to MySQLDatabaseManager::createDatabase(),
MicrosoftSQLDatabaseManager::createDatabase(),
PostgreSQLSchemaManager::createDatabase(),
PostgreSQLSchemaManager::deleteDatabase(), and
PermissionControlledMicrosoftSQLServerDatabaseManager::deleteDatabase() so all
interpolated identifiers are safely escaped and charsets are validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 83fcb8ad-941b-43aa-9bfc-c893640d0c49

📥 Commits

Reviewing files that changed from the base of the PR and between 60dd522 and 1aa3e25.

📒 Files selected for processing (1)
  • src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php

Comment thread src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php Outdated
Comment on lines +14 to +16
$charset = strtoupper($this->connection()->getConfig('charset') ?? 'UTF8');

return $this->connection()->statement("CREATE DATABASE \"{$database}\" WITH TEMPLATE=template0 ENCODING='{$charset}'");
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify where tenant DB names/charsets are composed into SQL
rg -nP --type=php -C3 "CREATE DATABASE|DROP DATABASE|database\(\)->getName\(\)|getConfig\('charset'\)"

# Verify whether validation/escaping exists upstream for database names/charset
rg -nP --type=php -C3 "preg_match\(|str_replace\(|quote|escape|validate.*database|validate.*charset|InvalidArgumentException"

Repository: archtechx/tenancy

Length of output: 19652


🏁 Script executed:

# Find TenantWithDatabase interface and implementations
fd -t f -e php | xargs rg -l "TenantWithDatabase|interface.*Database|getName.*function"

# Look at database name setter/validation
rg -nP --type=php "database\(\)" -A5 -B2

# Check how tenant database names are created/validated
rg -nP --type=php "tenant.*database|setDatabase|database.*=|getTenantDatabase" -A3

# Look for any input validation on database names
rg -nP --type=php "validate.*Name|preg_match.*database|database.*preg" -A3 -B2

Repository: archtechx/tenancy

Length of output: 50373


🏁 Script executed:

# Specifically search for TenantWithDatabase implementation
fd -type f -name "*.php" -path "*/src/*" | xargs rg -l "TenantWithDatabase" | head -5

# Look at the database contract
fd -type f -name "*Database*.php" -path "*/src/*" | head -10

Repository: archtechx/tenancy

Length of output: 593


🏁 Script executed:

# Check if there's validation in database model
rg -nP --type=php "class.*Database|function getName|protected \$database|private \$database" -A3 | head -100

Repository: archtechx/tenancy

Length of output: 6870


🏁 Script executed:

# Read DatabaseConfig to see how getName() works
cat src/Database/DatabaseConfig.php

# Read HasDatabase to see the database() method
cat src/Database/Concerns/HasDatabase.php

Repository: archtechx/tenancy

Length of output: 9261


🏁 Script executed:

# Find getTenantKey implementation and Tenant model
rg -n "getTenantKey|function getTenantKey" | head -20

# Look at test tenant model to understand key format
cat tests/Etc/Tenant.php | head -60

Repository: archtechx/tenancy

Length of output: 3843


🏁 Script executed:

# Find the base Tenant model and getTenantKey implementation
find . -path ./vendor -prune -o -name "*.php" -type f | xargs rg -l "class Tenant" | grep -E "src|models" | head -5

# Look at the actual Tenant model in src
cat src/Database/Models/Tenant.php | head -150

Repository: archtechx/tenancy

Length of output: 2638


🏁 Script executed:

# Check how tenant IDs are generated/validated
rg -n "GeneratesIds|generateId" -A 10 src/Database/Concerns/ | head -80

# Check the default prefix/suffix configuration
fd -type f -name "*.php" | xargs rg -l "database.prefix|database.suffix" | grep config

# Look for configuration files
fd -name "config.php" -o -name "*config*.php" | grep tenancy

Repository: archtechx/tenancy

Length of output: 1415


🏁 Script executed:

# Check GeneratesIds implementation
cat src/Database/Concerns/GeneratesIds.php

# Find the UniqueIdentifierGenerator
rg "class UniqueIdentifierGenerator" -A 20

# Look for config files with tenancy database config
find . -path ./vendor -prune -o -name "*.php" -type f -print | xargs grep -l "database.prefix\|database.suffix" 2>/dev/null | head -5

Repository: archtechx/tenancy

Length of output: 940


🏁 Script executed:

# Find UniqueIdentifierGenerator implementation
rg "class UniqueIdentifierGenerator" -A 15

# Find where database.prefix and suffix are used
rg "database.prefix|database.suffix" -B 3 -A 3

# Look at actual config examples
fd -name "*.php" | xargs grep -l "prefix.*suffix" | head -10

Repository: archtechx/tenancy

Length of output: 1824


🏁 Script executed:

# Check what the actual default prefix/suffix values are
rg "tenancy.database" -B 2 -A 2 | grep -E "prefix|suffix|tenancy.database" | head -20

# See if there are integration/acceptance tests with problematic database names
rg "CREATE DATABASE|DROP DATABASE" tests/ -B 3 -A 3

# Check PostgreSQL schema manager which also has the same issue
cat src/Database/TenantDatabaseManagers/PostgreSQLSchemaManager.php

Repository: archtechx/tenancy

Length of output: 4094


🏁 Script executed:

# Check if there's any validation in the tenant model or migrations
fd -name "*migrations*" -type d | head -5

# Look for any validation rules on tenant attributes
rg "validate|rules\(\)|fillable|guarded" src/Database/Models/Tenant.php -A 5

# Check if charset is validated anywhere
rg "charset" src/ --type php | grep -v "CREATE DATABASE\|DROP SCHEMA"

# Look for any input validation on database name
rg "db_name|getName|setInternal" src/Database/Concerns/HasInternalKeys.php -A 10

Repository: archtechx/tenancy

Length of output: 1055


Escape interpolated SQL values in CREATE DATABASE statements across all database managers.

While database names are currently auto-generated from tenant IDs and config values in the default setup, the code allows customization via DatabaseConfig::generateDatabaseNamesUsing() and lacks input validation/escaping. If a developer uses a custom name generator with user-controlled input, SQL injection becomes possible. The charset value, though sourced from config, is also interpolated without validation.

This same unescaped interpolation pattern exists in:

  • MySQLDatabaseManager::createDatabase() (line 17)
  • MicrosoftSQLDatabaseManager::createDatabase() (line 15)
  • PostgreSQLSchemaManager::createDatabase() & deleteDatabase() (lines 13, 18)
  • PermissionControlledMicrosoftSQLServerDatabaseManager::deleteDatabase() (line 61)

Hardening these statements with proper escaping (PostgreSQL double-quote escaping for names, charset validation) adds defense-in-depth and prevents potential injection if customizations are used.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php` around
lines 14 - 16, The CREATE/DELETE SQL builds interpolate database and charset
values directly, enabling SQL injection if names/charset are customized; update
PostgreSQLDatabaseManager::createDatabase() to escape identifiers by doubling
any embedded double quotes in the $database name and wrap it in double quotes,
and validate/normalize $charset against an allow-list (or restrict to safe
alphanumerics) before including it; apply the same escaping/validation pattern
to MySQLDatabaseManager::createDatabase(),
MicrosoftSQLDatabaseManager::createDatabase(),
PostgreSQLSchemaManager::createDatabase(),
PostgreSQLSchemaManager::deleteDatabase(), and
PermissionControlledMicrosoftSQLServerDatabaseManager::deleteDatabase() so all
interpolated identifiers are safely escaped and charsets are validated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use parameters here since they're very low level CREATE ... statements, but perhaps escaping might be possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stancl — agreed, escaping is definitely feasible here. There are two things to harden:

1. Database name — identifier escaping

PostgreSQL quoted identifiers can contain any character except the null byte; to include a literal double quote inside a quoted identifier, write two consecutive double quotes. The database name is already wrapped in double quotes, so the fix is simply to double any embedded " before interpolating:

$database = str_replace('"', '""', $tenant->database()->getName());

2. Charset — allowlist validation

PostgreSQL supports storing text in a variety of character sets including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC, UTF-8, and Mule internal code. All valid PostgreSQL encoding names (e.g. UTF8, EUC_JP, SQL_ASCII, LATIN1, WIN1252) consist only of alphanumerics, underscores, and hyphens — so a simple regex guards against injection through the ENCODING='...' string literal:

if (!preg_match('/^[A-Z0-9_\-]+$/i', $charset)) {
    throw new \InvalidArgumentException("Invalid charset: {$charset}");
}

Putting it together:

public function createDatabase(TenantWithDatabase $tenant): bool
{
    $database = str_replace('"', '""', $tenant->database()->getName());
    $charset = strtoupper($this->connection()->getConfig('charset') ?? 'UTF8');

    if (!preg_match('/^[A-Z0-9_\-]+$/i', $charset)) {
        throw new \InvalidArgumentException("Invalid PostgreSQL charset: {$charset}");
    }

    return $this->connection()->statement("CREATE DATABASE \"{$database}\" WITH TEMPLATE=template0 ENCODING='{$charset}'");
}

The same identifier-escaping pattern (str_replace('"', '""', ...)) should be applied to deleteDatabase() and databaseExists() in PostgreSQLDatabaseManager and PostgreSQLSchemaManager as well. For MySQL/MSSQL managers, the equivalent is backtick-doubling / bracket-escaping for their respective identifier quoting styles.

@stancl
Copy link
Copy Markdown
Member

stancl commented Apr 12, 2026

Will resolve these reviews on our end 👍

lukinovec and others added 2 commits April 24, 2026 11:52
If the charset is null, Postgres creates the DB with the server's default charset

Co-authored-by: Copilot <copilot@github.com>
If DB charset is configured to null, new databases will use the Postgres server's default charset (which is utf8 by default)

Co-authored-by: Copilot <copilot@github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php (1)

13-21: ⚠️ Potential issue | 🔴 Critical

Escape the database identifier and validate charset before interpolating SQL.

Line 17 and Line 20 interpolate raw values into CREATE DATABASE. A quoted identifier with embedded " (or crafted charset) can break the statement and opens an injection path with custom DB-name generators.

Suggested hardening
-        $database = $tenant->database()->getName();
+        $database = str_replace('"', '""', $tenant->database()->getName());
         // If null, Postgres creates the DB with the server's default charset
         $charset = $this->connection()->getConfig('charset');

         $query = "CREATE DATABASE \"{$database}\" WITH TEMPLATE=template0";

         if ($charset !== null) {
-            $query .= " ENCODING='" . strtoupper($charset) . "'";
+            $charset = strtoupper((string) $charset);
+            if (! preg_match('/^[A-Z0-9_-]+$/', $charset)) {
+                throw new \Exception("Invalid PostgreSQL charset: {$charset}");
+            }
+            $query .= " ENCODING='{$charset}'";
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php` around
lines 13 - 21, The CREATE DATABASE construction currently interpolates $database
and $charset directly; replace this with a safe-quoted identifier for $database
and a validated/whitelisted charset before building the SQL. Locate where
$database and $charset are used in PostgreSQLDatabaseManager (the variables
$database, $charset and the method connection()), call your DBAL/driver
identifier-quoting helper (e.g.,
connection()->getDatabasePlatform()->quoteIdentifier() or pg_escape_identifier /
appropriate driver API) to escape $database, and validate $charset against a
strict uppercase whitelist of allowed encodings (only append ENCODING if it
matches the whitelist) rather than inserting raw user values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/TenantDatabaseManagerTest.php`:
- Around line 561-564: Update the misleading inline comment above the DB query
that sets $serverCharset: replace the hardcoded “Postgres server's default
charset (utf8)” with a generic description reflecting that the test supports
non-UTF8 server defaults (e.g., “Postgres server's default charset”) so the
comment matches the behavior around the DB::connection('pgsql')->selectOne(...)
call that assigns $serverCharset.

---

Duplicate comments:
In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php`:
- Around line 13-21: The CREATE DATABASE construction currently interpolates
$database and $charset directly; replace this with a safe-quoted identifier for
$database and a validated/whitelisted charset before building the SQL. Locate
where $database and $charset are used in PostgreSQLDatabaseManager (the
variables $database, $charset and the method connection()), call your
DBAL/driver identifier-quoting helper (e.g.,
connection()->getDatabasePlatform()->quoteIdentifier() or pg_escape_identifier /
appropriate driver API) to escape $database, and validate $charset against a
strict uppercase whitelist of allowed encodings (only append ENCODING if it
matches the whitelist) rather than inserting raw user values.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8aec52cb-8b49-4dd0-9ae2-81a76edf1d91

📥 Commits

Reviewing files that changed from the base of the PR and between 1aa3e25 and d41ac46.

📒 Files selected for processing (2)
  • src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php
  • tests/TenantDatabaseManagerTest.php

Comment thread tests/TenantDatabaseManagerTest.php Outdated
@stancl stancl self-assigned this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants