Document Search Solution with OCR

Paperless seems like the ideal candidate:
* Append-Only
* Can be used as a secondary store to the main git based document store
* Has an inbox directory it observes, where new documents can be moved to (use [PAPERLESS_CONSUMER_DELETE_DUPLICATES](https://docs.paperless-ngx.com/configuration/#PAPERLESS_CONSUMER_DELETE_DUPLICATES) for deleting duplicates)
* Is flat and only compares documents by hash (moves of documents in the main store have no effect on the index)
* Deletes of files are not reflected correctly with the secondary store approach, but are rare anyway and in case it gets out of hand, the paperless index can be rebuilt anytime

As a result, a script can be implemented that clones the main store, moves the documents into the inbox, let paperless delete anything that is duplicate and index documents that are new. Paperless effectively becomes a web based document search index for personal documents and can be destroyed and rebuilt anytime, just like a read model in CQRS.

Points to concider:
* Maybe use rsync to only copy over documents of a type that paperless can understand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Search Solution with OCR #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Document Search Solution with OCR #23

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions