[4.x] Add charset support to PostgreSQLDatabaseManager#1429
[4.x] Add charset support to PostgreSQLDatabaseManager#1429p1xel007 wants to merge 6 commits intoarchtechx:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1429 +/- ##
============================================
- Coverage 86.08% 85.86% -0.22%
- Complexity 1154 1164 +10
============================================
Files 184 185 +1
Lines 3377 3404 +27
============================================
+ Hits 2907 2923 +16
- Misses 470 481 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The title says charset and collation — are we handling collation here? The diff only shows charset being used. |
The first commit did include collation changes, but these were removed in the second commit. The commit title wasn’t updated at the time, but I’ve now corrected it to reflect the changes. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PostgreSQL manager's Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php`:
- Line 14: The code currently forces UTF8 by default when reading the charset
via $this->connection()->getConfig('charset') in PostgreSQLDatabaseManager (the
$charset assignment); remove that hardcoded fallback so the code reads the
configured charset only (letting PostgreSQL use the cluster default when no
charset is provided), and also add explicit guidance in the shipped
assets/config.php explaining the optional "charset" setting and how to set it
when a non-cluster default is required so deployments can override it if needed.
- Around line 14-16: The CREATE/DELETE SQL builds interpolate database and
charset values directly, enabling SQL injection if names/charset are customized;
update PostgreSQLDatabaseManager::createDatabase() to escape identifiers by
doubling any embedded double quotes in the $database name and wrap it in double
quotes, and validate/normalize $charset against an allow-list (or restrict to
safe alphanumerics) before including it; apply the same escaping/validation
pattern to MySQLDatabaseManager::createDatabase(),
MicrosoftSQLDatabaseManager::createDatabase(),
PostgreSQLSchemaManager::createDatabase(),
PostgreSQLSchemaManager::deleteDatabase(), and
PermissionControlledMicrosoftSQLServerDatabaseManager::deleteDatabase() so all
interpolated identifiers are safely escaped and charsets are validated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 83fcb8ad-941b-43aa-9bfc-c893640d0c49
📒 Files selected for processing (1)
src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php
| $charset = strtoupper($this->connection()->getConfig('charset') ?? 'UTF8'); | ||
|
|
||
| return $this->connection()->statement("CREATE DATABASE \"{$database}\" WITH TEMPLATE=template0 ENCODING='{$charset}'"); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify where tenant DB names/charsets are composed into SQL
rg -nP --type=php -C3 "CREATE DATABASE|DROP DATABASE|database\(\)->getName\(\)|getConfig\('charset'\)"
# Verify whether validation/escaping exists upstream for database names/charset
rg -nP --type=php -C3 "preg_match\(|str_replace\(|quote|escape|validate.*database|validate.*charset|InvalidArgumentException"Repository: archtechx/tenancy
Length of output: 19652
🏁 Script executed:
# Find TenantWithDatabase interface and implementations
fd -t f -e php | xargs rg -l "TenantWithDatabase|interface.*Database|getName.*function"
# Look at database name setter/validation
rg -nP --type=php "database\(\)" -A5 -B2
# Check how tenant database names are created/validated
rg -nP --type=php "tenant.*database|setDatabase|database.*=|getTenantDatabase" -A3
# Look for any input validation on database names
rg -nP --type=php "validate.*Name|preg_match.*database|database.*preg" -A3 -B2Repository: archtechx/tenancy
Length of output: 50373
🏁 Script executed:
# Specifically search for TenantWithDatabase implementation
fd -type f -name "*.php" -path "*/src/*" | xargs rg -l "TenantWithDatabase" | head -5
# Look at the database contract
fd -type f -name "*Database*.php" -path "*/src/*" | head -10Repository: archtechx/tenancy
Length of output: 593
🏁 Script executed:
# Check if there's validation in database model
rg -nP --type=php "class.*Database|function getName|protected \$database|private \$database" -A3 | head -100Repository: archtechx/tenancy
Length of output: 6870
🏁 Script executed:
# Read DatabaseConfig to see how getName() works
cat src/Database/DatabaseConfig.php
# Read HasDatabase to see the database() method
cat src/Database/Concerns/HasDatabase.phpRepository: archtechx/tenancy
Length of output: 9261
🏁 Script executed:
# Find getTenantKey implementation and Tenant model
rg -n "getTenantKey|function getTenantKey" | head -20
# Look at test tenant model to understand key format
cat tests/Etc/Tenant.php | head -60Repository: archtechx/tenancy
Length of output: 3843
🏁 Script executed:
# Find the base Tenant model and getTenantKey implementation
find . -path ./vendor -prune -o -name "*.php" -type f | xargs rg -l "class Tenant" | grep -E "src|models" | head -5
# Look at the actual Tenant model in src
cat src/Database/Models/Tenant.php | head -150Repository: archtechx/tenancy
Length of output: 2638
🏁 Script executed:
# Check how tenant IDs are generated/validated
rg -n "GeneratesIds|generateId" -A 10 src/Database/Concerns/ | head -80
# Check the default prefix/suffix configuration
fd -type f -name "*.php" | xargs rg -l "database.prefix|database.suffix" | grep config
# Look for configuration files
fd -name "config.php" -o -name "*config*.php" | grep tenancyRepository: archtechx/tenancy
Length of output: 1415
🏁 Script executed:
# Check GeneratesIds implementation
cat src/Database/Concerns/GeneratesIds.php
# Find the UniqueIdentifierGenerator
rg "class UniqueIdentifierGenerator" -A 20
# Look for config files with tenancy database config
find . -path ./vendor -prune -o -name "*.php" -type f -print | xargs grep -l "database.prefix\|database.suffix" 2>/dev/null | head -5Repository: archtechx/tenancy
Length of output: 940
🏁 Script executed:
# Find UniqueIdentifierGenerator implementation
rg "class UniqueIdentifierGenerator" -A 15
# Find where database.prefix and suffix are used
rg "database.prefix|database.suffix" -B 3 -A 3
# Look at actual config examples
fd -name "*.php" | xargs grep -l "prefix.*suffix" | head -10Repository: archtechx/tenancy
Length of output: 1824
🏁 Script executed:
# Check what the actual default prefix/suffix values are
rg "tenancy.database" -B 2 -A 2 | grep -E "prefix|suffix|tenancy.database" | head -20
# See if there are integration/acceptance tests with problematic database names
rg "CREATE DATABASE|DROP DATABASE" tests/ -B 3 -A 3
# Check PostgreSQL schema manager which also has the same issue
cat src/Database/TenantDatabaseManagers/PostgreSQLSchemaManager.phpRepository: archtechx/tenancy
Length of output: 4094
🏁 Script executed:
# Check if there's any validation in the tenant model or migrations
fd -name "*migrations*" -type d | head -5
# Look for any validation rules on tenant attributes
rg "validate|rules\(\)|fillable|guarded" src/Database/Models/Tenant.php -A 5
# Check if charset is validated anywhere
rg "charset" src/ --type php | grep -v "CREATE DATABASE\|DROP SCHEMA"
# Look for any input validation on database name
rg "db_name|getName|setInternal" src/Database/Concerns/HasInternalKeys.php -A 10Repository: archtechx/tenancy
Length of output: 1055
Escape interpolated SQL values in CREATE DATABASE statements across all database managers.
While database names are currently auto-generated from tenant IDs and config values in the default setup, the code allows customization via DatabaseConfig::generateDatabaseNamesUsing() and lacks input validation/escaping. If a developer uses a custom name generator with user-controlled input, SQL injection becomes possible. The charset value, though sourced from config, is also interpolated without validation.
This same unescaped interpolation pattern exists in:
MySQLDatabaseManager::createDatabase()(line 17)MicrosoftSQLDatabaseManager::createDatabase()(line 15)PostgreSQLSchemaManager::createDatabase()&deleteDatabase()(lines 13, 18)PermissionControlledMicrosoftSQLServerDatabaseManager::deleteDatabase()(line 61)
Hardening these statements with proper escaping (PostgreSQL double-quote escaping for names, charset validation) adds defense-in-depth and prevents potential injection if customizations are used.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php` around
lines 14 - 16, The CREATE/DELETE SQL builds interpolate database and charset
values directly, enabling SQL injection if names/charset are customized; update
PostgreSQLDatabaseManager::createDatabase() to escape identifiers by doubling
any embedded double quotes in the $database name and wrap it in double quotes,
and validate/normalize $charset against an allow-list (or restrict to safe
alphanumerics) before including it; apply the same escaping/validation pattern
to MySQLDatabaseManager::createDatabase(),
MicrosoftSQLDatabaseManager::createDatabase(),
PostgreSQLSchemaManager::createDatabase(),
PostgreSQLSchemaManager::deleteDatabase(), and
PermissionControlledMicrosoftSQLServerDatabaseManager::deleteDatabase() so all
interpolated identifiers are safely escaped and charsets are validated.
There was a problem hiding this comment.
We cannot use parameters here since they're very low level CREATE ... statements, but perhaps escaping might be possible.
There was a problem hiding this comment.
@stancl — agreed, escaping is definitely feasible here. There are two things to harden:
1. Database name — identifier escaping
PostgreSQL quoted identifiers can contain any character except the null byte; to include a literal double quote inside a quoted identifier, write two consecutive double quotes. The database name is already wrapped in double quotes, so the fix is simply to double any embedded " before interpolating:
$database = str_replace('"', '""', $tenant->database()->getName());2. Charset — allowlist validation
PostgreSQL supports storing text in a variety of character sets including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC, UTF-8, and Mule internal code. All valid PostgreSQL encoding names (e.g. UTF8, EUC_JP, SQL_ASCII, LATIN1, WIN1252) consist only of alphanumerics, underscores, and hyphens — so a simple regex guards against injection through the ENCODING='...' string literal:
if (!preg_match('/^[A-Z0-9_\-]+$/i', $charset)) {
throw new \InvalidArgumentException("Invalid charset: {$charset}");
}Putting it together:
public function createDatabase(TenantWithDatabase $tenant): bool
{
$database = str_replace('"', '""', $tenant->database()->getName());
$charset = strtoupper($this->connection()->getConfig('charset') ?? 'UTF8');
if (!preg_match('/^[A-Z0-9_\-]+$/i', $charset)) {
throw new \InvalidArgumentException("Invalid PostgreSQL charset: {$charset}");
}
return $this->connection()->statement("CREATE DATABASE \"{$database}\" WITH TEMPLATE=template0 ENCODING='{$charset}'");
}The same identifier-escaping pattern (str_replace('"', '""', ...)) should be applied to deleteDatabase() and databaseExists() in PostgreSQLDatabaseManager and PostgreSQLSchemaManager as well. For MySQL/MSSQL managers, the equivalent is backtick-doubling / bracket-escaping for their respective identifier quoting styles.
|
Will resolve these reviews on our end 👍 |
If the charset is null, Postgres creates the DB with the server's default charset Co-authored-by: Copilot <copilot@github.com>
If DB charset is configured to null, new databases will use the Postgres server's default charset (which is utf8 by default) Co-authored-by: Copilot <copilot@github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php (1)
13-21:⚠️ Potential issue | 🔴 CriticalEscape the database identifier and validate charset before interpolating SQL.
Line 17 and Line 20 interpolate raw values into
CREATE DATABASE. A quoted identifier with embedded"(or crafted charset) can break the statement and opens an injection path with custom DB-name generators.Suggested hardening
- $database = $tenant->database()->getName(); + $database = str_replace('"', '""', $tenant->database()->getName()); // If null, Postgres creates the DB with the server's default charset $charset = $this->connection()->getConfig('charset'); $query = "CREATE DATABASE \"{$database}\" WITH TEMPLATE=template0"; if ($charset !== null) { - $query .= " ENCODING='" . strtoupper($charset) . "'"; + $charset = strtoupper((string) $charset); + if (! preg_match('/^[A-Z0-9_-]+$/', $charset)) { + throw new \Exception("Invalid PostgreSQL charset: {$charset}"); + } + $query .= " ENCODING='{$charset}'"; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php` around lines 13 - 21, The CREATE DATABASE construction currently interpolates $database and $charset directly; replace this with a safe-quoted identifier for $database and a validated/whitelisted charset before building the SQL. Locate where $database and $charset are used in PostgreSQLDatabaseManager (the variables $database, $charset and the method connection()), call your DBAL/driver identifier-quoting helper (e.g., connection()->getDatabasePlatform()->quoteIdentifier() or pg_escape_identifier / appropriate driver API) to escape $database, and validate $charset against a strict uppercase whitelist of allowed encodings (only append ENCODING if it matches the whitelist) rather than inserting raw user values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/TenantDatabaseManagerTest.php`:
- Around line 561-564: Update the misleading inline comment above the DB query
that sets $serverCharset: replace the hardcoded “Postgres server's default
charset (utf8)” with a generic description reflecting that the test supports
non-UTF8 server defaults (e.g., “Postgres server's default charset”) so the
comment matches the behavior around the DB::connection('pgsql')->selectOne(...)
call that assigns $serverCharset.
---
Duplicate comments:
In `@src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.php`:
- Around line 13-21: The CREATE DATABASE construction currently interpolates
$database and $charset directly; replace this with a safe-quoted identifier for
$database and a validated/whitelisted charset before building the SQL. Locate
where $database and $charset are used in PostgreSQLDatabaseManager (the
variables $database, $charset and the method connection()), call your
DBAL/driver identifier-quoting helper (e.g.,
connection()->getDatabasePlatform()->quoteIdentifier() or pg_escape_identifier /
appropriate driver API) to escape $database, and validate $charset against a
strict uppercase whitelist of allowed encodings (only append ENCODING if it
matches the whitelist) rather than inserting raw user values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8aec52cb-8b49-4dd0-9ae2-81a76edf1d91
📒 Files selected for processing (2)
src/Database/TenantDatabaseManagers/PostgreSQLDatabaseManager.phptests/TenantDatabaseManagerTest.php
Problem
The
PostgreSQLDatabaseManagercreates tenant databases without specifying encoding, unlikeMySQLDatabaseManagerwhich respects the connection's charset configuration.When encoding is not explicitly defined, PostgreSQL uses the server's default encoding (inherited from
template0). This causes issues on systems where the default encoding differs from the application's expected encoding:WIN1252encodingLATIN1or other regional encodingsThis mismatch leads to errors like:
SQLSTATE[22P05]: Untranslatable character: 7 ERROR: character with byte sequence 0xe2 0x86 0x92 in encoding "UTF8" has no equivalent in encoding "WIN1252"Solution
Follow the same pattern as
MySQLDatabaseManagerby reading thecharsetvalue from the database connection configuration and applying them when creating tenant databases.Changes
charsetvalue from the connection configENCODINGparameter to the CREATE DATABASE statementConfiguration
Users can now control the encoding via their
config/database.php:Notes
Summary by CodeRabbit