Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.    By clickin

Tesseract's WASM file too big to process #4748

submited by
Style Pass
2023-03-14 19:30:08

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Hello, I'm writing the what-to-click extension. I've added the tesseract.js library locally for OCR functionality, which works fine, but it caused Firefox Addons linter to fail.

Tesseract uses WebAssembly to speed up the process of analysing images. This comes with a file size overhead, which is so great, that I can no longer upload my extension to the Developer Hub:

The linter suggestion is valid, I would very much like to split the file into smaller ones (the enourmous filesize comes from a blob included in it), but I don't see a way of doing this because of the way the file is handled -- it's automatically loaded by tesseract, not the extension code, so import/export directives doesn't work and I also doesn't have the browser object available. The blob is also critical to be included in the file as because of this issue, importScripts is not available.

The simplest solution would be to bump the singular file size limit to 5MB, as the problematic file is 4.8MB big, and such limit bump shouldn't cause overload on linter servers. However, if you see any option to reduce the filesize by any means I'm certinly open to it.

Leave a Comment