Skip to content

InsertImage produces invalid PDF: scientific notation in content stream matrix #234

@weaverant

Description

@weaverant

Bug Description

Page.InsertImage() can produce invalid PDF content streams that cause blank pages. The issue is that the image transformation matrix values are formatted using C#'s default float ToString(), which produces scientific notation (e.g. 3.0517578E-05) for very small values. Scientific notation is not valid in PDF syntax.

Reproduction

var pix = new Pixmap("image.jpg"); // 629x1000 image
var scaled = new Pixmap(pix, 943, 1500, null); // scale up
byte[] jpeg = scaled.ToBytes("jpg", 65);

var doc = new Document();
doc.NewPage(0, 943, 1500);
doc[0].InsertImage(doc[0].Rect, stream: jpeg);

// Content stream produced:
// q
// 942.99994 0 0 1500 3.0517578E-05 0 cm
// /fzImg0 Do
// Q

The 3.0517578E-05 is invalid PDF syntax. PDF readers cannot parse it, resulting in the image not rendering (blank page). The 942.99994 (should be 943) is also caused by float imprecision in the matrix calculation.

Root Cause

In Page.cs around line 2244, the matrix values are formatted using string.Format with no format specifier:

nres.fz_append_string(
    string.Format(System.Globalization.CultureInfo.InvariantCulture, template,
        mat.a, mat.b, mat.c, mat.d, mat.e, mat.f, imgName)
);

C#'s default float formatting ("G" format) produces scientific notation for values with small exponents. PDF number syntax (ISO 32000-1, section 7.3.3) only allows [+-]?(\d+\.?\d*|\.\d+) -- no exponent notation.

The matrix values come from Utils.CalcImageMatrix() (Utils.cs line 5574), where multiple fz_concat calls accumulate floating point imprecision, turning 0.0 into ~0.00003 and 943.0 into 942.99994.

Suggested Fix

Format matrix values explicitly to prevent scientific notation:

nres.fz_append_string(
    string.Format(System.Globalization.CultureInfo.InvariantCulture, template,
        mat.a.ToString("0.######", CultureInfo.InvariantCulture),
        mat.b.ToString("0.######", CultureInfo.InvariantCulture),
        mat.c.ToString("0.######", CultureInfo.InvariantCulture),
        mat.d.ToString("0.######", CultureInfo.InvariantCulture),
        mat.e.ToString("0.######", CultureInfo.InvariantCulture),
        mat.f.ToString("0.######", CultureInfo.InvariantCulture),
        imgName)
);

Note: this same pattern (formatting float matrix values into content streams via string.Format) may exist in other methods as well (e.g. InsertText, shape drawing). Those would have the same vulnerability to scientific notation if the matrix values are small enough.

Environment

  • MuPDF.NET 3.2.13
  • .NET 8.0, Windows 11

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions