Degree Names
Bachelor's in Software Engineering
Bachelor's in Computer Science
Bachelor's in Information Technology
Description
🛠 Project Overview:
We are building a browser-based or hybrid SaaS tool for Optical Character Recognition (OCR) using Tesseract.js or a similar OCR solution in JavaScript. The tool should offer:
- Maximum possible OCR accuracy
- Support for handwritten and printed text
- OCR support for the following languages:
- English, Spanish, Russian, Dutch, Italian, Portuguese, Indonesian, German, French, Korean, Danish, Czech, Swedish, Polish, Romanian, Thai, Vietnamese, Turkish, Japanese, Chinese, Georgian, Finnish, Arabic
Note: The solution must not use heavy machine learning models, and should rely on Tesseract and image preprocessing optimizations.
💡 Nice to Have:
- Experience working on Tesseract (https://tesseract.projectnaptha.com/)
- Understanding of handwriting OCR challenges
📩 Deliverables:
- High-performance, accurate OCR module (React or plain JS)
- Tuning and configurations for all listed languages
- Well-documented, clean codebase and integration guide
❓ Questions to Answer in Your Application:
- Have you previously worked with Tesseract.js? Please provide code samples or GitHub links.
- What strategies would you use to optimize OCR accuracy in the browser?
- What are Tesseract’s limitations with handwritten text, and how would you mitigate them without ML?
- Have you ever worked on image preprocessing (e.g., binarization, noise reduction)? If yes, how?
- What challenges do you foresee in implementing this OCR tool purely in the frontend?
Responsibilities
✅ Responsibilities:
- Implement Tesseract.js-based OCR in a React or Next.js frontend
- Configure and optimize Tesseract for multi-language and handwriting scenarios
- Efficiently load large traineddata language files in-browser
- Build a clean, modular OCR interface for integration into our SaaS product
- Apply image pre-processing techniques to improve OCR accuracy (grayscale, thresholding, etc.)
- Deliver performance optimizations for speed and accuracy
- Work collaboratively and communicate technical decisions clearly
Requirements
🧠 Required Skills:
- Strong experience with JavaScript/TypeScript
- Proficiency with Next.js or React.js
- Hands-on experience with Tesseract.js
- Familiarity with Tesseract configuration (
psm
, oem
, etc.) - Knowledge of image processing basics (e.g., sharp.js, canvas)
- Performance tuning and efficient frontend architecture
Benefits
- Longterm Project
- Competitive Salary
- Learning Environment