Skip to content

refactor!: use Point struct across the native vision stack#1130

Open
chmjkb wants to merge 3 commits intomainfrom
@chmjkb/native-point-refactor
Open

refactor!: use Point struct across the native vision stack#1130
chmjkb wants to merge 3 commits intomainfrom
@chmjkb/native-point-refactor

Conversation

@chmjkb
Copy link
Copy Markdown
Collaborator

@chmjkb chmjkb commented May 8, 2026

Description

Refactors the native code so it uses unified Point instead of random structures and pair arrays that were scattered all over the codebase.

Introduces a breaking change?

  • Yes
  • No

Changes the bbox member of OCRDetection to Bbox type instead of Point[]

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  • Run demo apps for object detection and ocr.

Screenshots

Related issues

#760
#1110

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@chmjkb chmjkb marked this pull request as ready for review May 8, 2026 06:17
@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented May 8, 2026

First question, does this handle the scope of this pr: #1126, if so I will close it and you can mark another issue to close ;)

EDIT: Ok, I see it's not handling this one, so maybe you can incorporate these changes into your PR?

@msluszniak msluszniak self-requested a review May 8, 2026 07:53
@chmjkb chmjkb linked an issue May 8, 2026 that may be closed by this pull request
@chmjkb
Copy link
Copy Markdown
Collaborator Author

chmjkb commented May 8, 2026

ok, so I'll cherry-pick commits from #1126 PR and then we'll close it? does it sound fine? @msluszniak

@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented May 8, 2026

ok, so I'll cherry-pick commits from #1126 PR and then we'll close it? does it sound fine? @msluszniak

Yes, probably this would need some tweaking since I hadn't used Point struct.

Resolves #760. The OCR and VerticalOCR pipelines previously exposed all four
rotated-rectangle corners in OCRDetection.bbox. Two points (top-left and
bottom-right of the axis-aligned bounding box) are sufficient for downstream
rendering and are simpler to consume.

Changes:
- Types.h: shrink OCRDetection.bbox from std::array<Point,4> to std::array<Point,2>
- RecognitionHandler.cpp: compute AABB (min/max x,y) over the four detector
  corners instead of forwarding them verbatim
- VerticalOCR.cpp: same AABB reduction in _processSingleTextBox
- OCR.cpp / VerticalOCR.cpp generateFromFrame: re-normalize the two bbox
  corners after inverseRotatePoints to guarantee bbox[0] <= bbox[1]
- JsiConversions.h: serialize 2 points instead of 4 to JavaScript
- OCRTest.cpp / VerticalOCRTest.cpp: assert size==2 and that bbox[1] >= bbox[0]
- ocr.ts: narrow TypeScript type from Point[] to [Point,Point] and update docs
Comment thread docs/docs/03-hooks/02-computer-vision/useOCR.md
@msluszniak
Copy link
Copy Markdown
Member

Also could you write some basic testing instructions, namely which models are affected, what I specifically need to check?

@chmjkb chmjkb changed the title refactor: use Point struct across the native vision stack refactor!: use Point struct across the native vision stack May 11, 2026
@chmjkb
Copy link
Copy Markdown
Collaborator Author

chmjkb commented May 11, 2026

Also could you write some basic testing instructions, namely which models are affected, what I specifically need to check?

i think other than running demo apps for object detection and ocr we're good

@chmjkb chmjkb requested a review from msluszniak May 11, 2026 08:16
Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a problem when there is no detection on the image. Instead of silent return of empty structure, I got an error:

Image

EDIT: Actually, on some images where there is no text, I get correct message that there was no vertical text. So this must be something else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor native code to use Point instead of x, y values Refactor code of OCR and VerticalOCR to utilise only two points in bounding boxes

3 participants