Product

AVIS: Why Industrial AI Inspection is Harder in Africa (And How We Solved It)

In early 2024 we brought an AVIS prototype to a factory manager in Nairobi Industrial Area. The demo was genuinely good: defect detection was running at 92% accuracy on our lab dataset, the UI was clean, the numbers looked solid. He watched it for about three minutes. Then he asked one question: "What happens when the power goes out?"

We didn't have a good answer. We had a cloud-first architecture, a dependency on continuous power to the inference node, and no plan for the 2–4 daily outages that are simply a fact of life in most Kenyan industrial estates. He was polite about it. We went back to the drawing board.

That question shaped the next six months of AVIS development more than anything else we learned from model training, benchmarks, or academic papers. This post is an honest account of what those months produced, and what it cost us to get there.

What the Factory Visits Actually Taught Us

Before writing a production version of AVIS, we did three extended factory visits. A pharmaceutical packager in Nairobi Industrial Area. A plastics extrusion facility in Athi River. A cosmetics manufacturer on the outskirts of Lagos. We went to observe, not to sell. What we found was consistent enough across all three sites to treat as a pattern rather than a coincidence.

Power outages are not edge cases. They are scheduled, anticipated, and managed: factories run generators, keep manual logs for continuity, and have developed workarounds over years of adaptation. An AI system that treats continuous power as a given will be overridden and ignored within a week of installation, not because operators don't trust AI, but because they have learned, correctly, not to depend on systems that go dark when the grid does.

Connectivity in industrial areas is 4G at best, 3G intermittently, occasionally nothing. We measured round-trip latency to a cloud inference endpoint from each site. The median was 340ms. The worst case, on an afternoon when the cell tower was loaded, was 2.8 seconds. For a production line running at 120 units per minute, a 2.8-second inference loop is not a degraded experience. It is a broken one.

Older machinery is everywhere. The factory in Athi River was running a blister packer from 1991. No OPC-UA. No MQTT. No digital interface of any kind. Every AI inspection vendor we had researched assumed you could get structured process data from the machine to correlate with visual findings. On roughly half the production lines we visited, that assumption fails completely.

The response to all three of these was design decisions, not workarounds. A 4-hour UPS ships standard with every AVIS installation, not optional, not "recommended," standard. It covers the inference node, camera array, and lighting rig. Edge inference is the primary path; cloud is backup and sync only. A vibration and acoustic sensor on the machine frame detects production cycles independent of any digital machine interface, and it works on a 1991 blister packer as well as a 2024 one.

The Problem We Spent Six Weeks Solving Wrong

Before the factory visits, we had spent months training and evaluating models. Our mAP on the lab dataset was 0.78, respectable for a first version, and we thought the gap to production accuracy was primarily a data problem. More images, better labels, more defect classes.

We went on-site for the first time with that model. The mAP dropped to 0.51. Same model, same product, different location.

Six weeks of model work, solved by a lighting assessment. Our lab accuracy was 0.78 mAP. On-site it was 0.51. We spent six weeks tuning hyperparameters, augmenting training data, and adjusting anchor boxes before we figured out the real issue: factory lighting (fluorescent tubes at different ages, some flickering, natural light through corrugated roof panels shifting across the shift) was making the camera's input fundamentally different from what the model had been trained on. We standardized the lighting and added a calibration procedure. Accuracy went to 0.91 without touching the model.

The embarrassing truth is we had been optimizing the wrong variable for six weeks. Model accuracy in a controlled environment is a necessary condition, not a sufficient one. The lighting in our lab was consistent, color-corrected, and stable. Factory lighting in Africa is frequently a mixture of aging fluorescent tubes at different color temperatures, natural light from skylights that varies hour by hour, and auxiliary work lights that operators move around. The camera was receiving fundamentally different input from what the model had learned on.

The fix was not a better model. It was a calibration procedure and a lighting assessment that now run at shift start and after any significant lighting change. AVIS ships with a calibration target, a reference card with known colors and geometries, and uses it to compute a per-session correction matrix. When calibration confidence falls below threshold, the system flags results with a reduced-confidence marker rather than suppressing alerts. The operator sees both the detection and the caveat. That decision, to show uncertainty rather than hide it, turned out to matter a great deal for operator trust.

Every AVIS installation now includes a lighting assessment as a standard step during commissioning. We find lighting problems in approximately 70% of sites. We used to find them in month two, after a client complaint. We now find them in day one, before the system goes live.

Why We Combined YOLOv11 with a Vision Language Model

YOLO-family models are excellent at fast, accurate detection and classification within a known taxonomy. Train it on 2,000 labeled images of your blister pack defects and it will tell you, at 40ms per frame, whether a cavity is empty, cracked, or contaminated. What it cannot do is tell you what kind of defect it is seeing when the defect is something it has never seen in training.

This matters more in practice than it sounds in theory. Manufacturing defects are not static. A new supplier batch changes the raw material's surface texture. A worn die introduces a crack morphology that doesn't match anything in the training set. In these cases, YOLO will either force-classify the anomaly into the nearest known bucket, which is wrong, or reject it as background noise, which is silently wrong. Neither is acceptable in a pharmaceutical quality context.

We added Qwen-VL as a second-pass analyzer for low-confidence detections. The handoff logic is straightforward: when YOLO's top prediction falls below 0.7 confidence, the frame routes to the vision language model for open-vocabulary characterization. Qwen-VL produces a natural-language description of the anomaly, which is stored alongside the detection record and surfaced to the quality engineer. Novel defects get characterized rather than silently misclassified.

30% model, 70% everything else. The YOLOv11 + Qwen-VL combination is good. But after our first several deployments, we have a clearer view of what actually determines whether an AVIS installation succeeds: it is not the model architecture. It is the installation procedure, the lighting calibration, the operator training, and whether the people running the line trust the system enough to act on what it tells them. If operators don't trust the output, they override it. When they override it, you have neither the AI inspection nor the human inspection. Industrial AI inspection in emerging markets is 30% model and 70% everything else.

The 0.7 threshold was calibrated empirically across held-out defect types. At that threshold, roughly 8% of detections route to Qwen-VL in a typical pharmaceutical packaging run. Qwen-VL runs quantized to INT4 on the NVIDIA Jetson AGX Orin's GPU; inference time is 380–620ms, too slow for every frame, acceptable for the 8% that need it. The YOLO path runs at 40ms. The system is fast for the common case and careful for the edge case.

The threshold is configurable. Most clients run at 0.7. One pharmaceutical client, under specific regulatory pressure, runs at 0.82, a more conservative posture that routes roughly 18% of detections to VLM review, at the cost of higher average latency. That is their call to make, not ours.

The ISO 9001 Audit We Weren't Ready For

Our first ISO 9001 audit at a client site was instructive in a way that "instructive" is a polite word for. We had done the validation work: the model was tested, the hardware was qualified, the detection performance was documented internally. What we had not done was produce that documentation in the format an auditor needs, a Software Validation Report covering Installation Qualification (IQ), Operational Qualification (OQ), and Performance Qualification (PQ).

The auditor arrived. He asked for the Software Validation Report. We spent three days writing it retroactively while he waited. He was gracious about it. We were not proud of it. The system was correctly validated; we had simply not produced the paper trail in the required format.

That situation does not happen anymore. AVIS now generates the IQ/OQ/PQ documentation automatically as part of the installation and qualification protocol. IQ runs automatically at system initialization and checks hardware configuration against a known-good spec. OQ runs a set of golden samples, physical defect standards we supply with every installation, and verifies detection accuracy against ground truth labels. PQ runs over the first production week and generates a statistical summary. The quality engineer signs off on each stage in the AVIS interface; signatures are cryptographically timestamped.

At the recertification audit following one of our pharmaceutical deployments, the auditor's note on the validation documentation was: "This is better than what I see from most European companies." We saved that email.

An Honest Customer Result

One of our early deployments was at a Nairobi pharmaceutical manufacturer producing oral solid dosage forms (tablets and capsules) for the East African market. Their quality challenge was batch rejections at the blister packaging stage: missing tablets, cracked blisters, foil seal defects, foreign matter.

Before AVIS, their batch rejection rate at final QC was 4.2%. That figure included both genuine defects and false rejections from inconsistent manual inspection; inspectors at the end of a 12-hour shift are measurably more likely to reject borderline product than inspectors at the start of one. Their quality manager described it as paying twice: once when the defect happens, and once when the inspector is tired.

After six months, their batch rejection rate was 0.8%.

That number is real, and we're proud of it. We want to be equally clear about what the six months looked like: 14 model retraining iterations with their specific product line, two hardware adjustments (one to the lighting rig, one to the camera mounting angle), and one full recalibration when they switched to a new excipient supplier in month four. The "plug and play" framing that shows up in AI vendor marketing would not describe this deployment accurately. The 6-month result is honest. The path to it was not simple.

The SPC integration also revealed something their quality team hadn't been able to identify manually: a systematic blister cavity underfill pattern that correlated with a worn filling nozzle on their second production line. The nozzle had been degrading gradually over months. It showed up in the AVIS SPC charts as a drift before it showed up as a defect rate spike. The CAPA closed the root cause within a week of detection.

Where This Leaves Us

AVIS is not a finished product in the sense that any software product with active customers and active learnings is finished. The next version adds multi-camera stereo depth for 3D defect measurement. We are building integrations with Kenyan and Nigerian pharmaceutical regulatory systems so that detection events flow directly into electronic batch records rather than requiring manual transcription. A pilot with three manufacturers is testing shared defect model libraries (the model weights, not the production data) to improve accuracy on novel defects across non-competing manufacturers in the same product category.

The broader point is the one the factory manager in Nairobi Industrial Area made in 2024, without knowing he was making it: the question is never just whether the technology works. The question is whether it works in this building, with this power supply, on this machinery, with these operators, under these audit requirements. That is a different question, and it requires different work to answer.

African manufacturing doesn't need a cut-down version of what German factories use. It needs something designed for its actual conditions. That is what AVIS is built to be. The factory visits, the six wasted weeks on model tuning, the three days writing audit documentation retroactively, the 14 retraining iterations: all of it is what the actual version of that looks like.

If you're running a manufacturing operation in East or West Africa and managing quality costs above 10% of revenue, we'd like to talk.