-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Bug Description
Page.InsertImage() can produce invalid PDF content streams that cause blank pages. The issue is that the image transformation matrix values are formatted using C#'s default float ToString(), which produces scientific notation (e.g. 3.0517578E-05) for very small values. Scientific notation is not valid in PDF syntax.
Reproduction
var pix = new Pixmap("image.jpg"); // 629x1000 image
var scaled = new Pixmap(pix, 943, 1500, null); // scale up
byte[] jpeg = scaled.ToBytes("jpg", 65);
var doc = new Document();
doc.NewPage(0, 943, 1500);
doc[0].InsertImage(doc[0].Rect, stream: jpeg);
// Content stream produced:
// q
// 942.99994 0 0 1500 3.0517578E-05 0 cm
// /fzImg0 Do
// QThe 3.0517578E-05 is invalid PDF syntax. PDF readers cannot parse it, resulting in the image not rendering (blank page). The 942.99994 (should be 943) is also caused by float imprecision in the matrix calculation.
Root Cause
In Page.cs around line 2244, the matrix values are formatted using string.Format with no format specifier:
nres.fz_append_string(
string.Format(System.Globalization.CultureInfo.InvariantCulture, template,
mat.a, mat.b, mat.c, mat.d, mat.e, mat.f, imgName)
);C#'s default float formatting ("G" format) produces scientific notation for values with small exponents. PDF number syntax (ISO 32000-1, section 7.3.3) only allows [+-]?(\d+\.?\d*|\.\d+) -- no exponent notation.
The matrix values come from Utils.CalcImageMatrix() (Utils.cs line 5574), where multiple fz_concat calls accumulate floating point imprecision, turning 0.0 into ~0.00003 and 943.0 into 942.99994.
Suggested Fix
Format matrix values explicitly to prevent scientific notation:
nres.fz_append_string(
string.Format(System.Globalization.CultureInfo.InvariantCulture, template,
mat.a.ToString("0.######", CultureInfo.InvariantCulture),
mat.b.ToString("0.######", CultureInfo.InvariantCulture),
mat.c.ToString("0.######", CultureInfo.InvariantCulture),
mat.d.ToString("0.######", CultureInfo.InvariantCulture),
mat.e.ToString("0.######", CultureInfo.InvariantCulture),
mat.f.ToString("0.######", CultureInfo.InvariantCulture),
imgName)
);Note: this same pattern (formatting float matrix values into content streams via string.Format) may exist in other methods as well (e.g. InsertText, shape drawing). Those would have the same vulnerability to scientific notation if the matrix values are small enough.
Environment
- MuPDF.NET 3.2.13
- .NET 8.0, Windows 11