dArchiva
dArchiva handles the full lifecycle of document digitization — from physical scanning through multi-engine OCR, semantic indexing, access-controlled retrieval, and legal admissibility — scaled to millions of documents across government and enterprise archives.
Key Features
From scanner to searchable — every step of the digitization pipeline engineered for accuracy, compliance, and scale.
Multi-Engine OCR
PaddleOCR (fastest, multilingual), Tesseract 5 (open baseline), and Qwen-VL (for handwritten, degraded, or complex layouts) run in ensemble. A voting mechanism selects the highest-confidence output per document region. 97%+ accuracy on typed Swahili, English, and Arabic text.
Hybrid Semantic Search
BM25 sparse retrieval combined with dense vector embeddings — sentence-transformers fine-tuned on legal and government Swahili/English corpora — in a weighted ensemble. Sub-second full-text search across 4M+ documents. Filters by date, type, department, and classification.
Layered Access Control
RBAC (Role-Based), ABAC (Attribute-Based), and ReBAC (Relationship-Based) access control unified in a single policy engine. Documents are tagged with classification levels (PUBLIC, INTERNAL, CONFIDENTIAL, SECRET). Attribute conditions — department, tenure, project — enforced at query time.
Physical Inventory Tracking
QR-coded physical boxes linked to digital manifests. Scan-in / scan-out tracking for physical files across a full location hierarchy: building → floor → room → shelf → box. Overdue-return alerts pushed to department heads automatically.
Auto Classification & Tagging
ML classifier assigns document type — contract, invoice, deed, report, memo — with 94% accuracy. Entity extraction indexes named persons, organizations, dates, and amounts for faceted search. Custom taxonomy support for department-specific classification schemes.
Legal Admissibility
Digitization log captures scanner serial, operator ID, timestamp, and SHA-256 hash at point of capture. Compliant with Kenya Evidence Act Cap 80 provisions for electronic records. Export packages for court submission include a full chain of custody affidavit.
Technical Specifications
Scale
- 4M+ documents in production
- <500ms search response at scale
- Batch processing: 10,000 pages/hour
- Multi-tenant with data isolation
OCR Engines
- PaddleOCR v4 (multilingual)
- Tesseract 5 (open source)
- Qwen-VL 7B (handwriting / complex)
- Arabic + Swahili + English primary
Security
- AES-256 encryption at rest
- TLS 1.3 in transit
- GDPR Article 17 erasure
- Kenya Data Protection Act 2019
Compliance
- Kenya National Archives Act
- Evidence Act Cap 80 (digital admissibility)
- ISO 15489 records management
- NIST SP 800-53 Rev 5
Digitize your archive with dArchiva
Contact our team to discuss your requirements. We respond within 24 hours.