Architecting a Realtime Factory Audit and Resolution Engine
Karan Hejmadi -
November 30, 2025
When we started building the audit module at Custiv, I didn't think of it as just another feature. It felt more like fixing something that everyone in manufacturing had quietly accepted as broken.
Factory audits were happening every day, but the way they were done hadn't really evolved.
A lot of it was still spreadsheet driven, reactive, and dependent on how experienced or thorough the auditor was on a given day. The process was slow, error-prone, and often led to a backlog of issues that took weeks to resolve.
It was clear that things needed fixing from the bottom.
What I saw on the ground
The biggest shift for me came from spending time on the shop-floor. Watching how audits actually happen is very different from reading about them.
You'd see auditors juggling between conversations, remarks, photos, and checklists. A lot of context gets lost in that process.
- Remarks are written later, not in the moment
- Observations depend heavily on memory and handwritten notes
- Reports take several days to compile
- Suppliers often hear about issues much later
And most importantly, there's no real system ensuring that issues are actually closed.
The core problem
At its core, the audit process was fragmented. Data lived everywhere - paper, spreadsheets, emails, whatsapp conversations. Nothing was connected, and nothing was truly real-time & actionable.
That leads to a few systemic issues:
-
The checklists are generic with options like "good", "bad", "needs improvement" which are open to interpretation and lead to inconsistent reporting
-
Auditors take a lot of time to compile the findings and share it with suppliers
-
Feedback loops are slow and inefficient
-
Corrective actions are not assigned a clear owner and timeline
-
There's no single source of truth
How I approached the solution
Instead of just digitizing forms, I wanted to rethink how audits should work if we started from scratch today.
The idea was simple:
Make the system do the heavy lifting, so the auditor can focus on observation and decision-making.
This led to a few key design decisions.
- Capture information as it appears
One thing I noticed is that a lot of valuable context is spoken, not written.
So we introduced voice-based capture during audits.
Auditors can just talk through what they're seeing, and the system processes that into structured data.

- Eliminate subjective evaluation
- Users create predefined templates with the exact options to choose for each question
- We use AI to define the options which are editable by the user
- The auditor has a clear set of options to choose from, which reduces ambiguity and ensures consistency across audits

This reduced a lot of manual effort without taking control away from the auditor.
- Reduce onboarding friction
We created a custom audit checklist parser that automatically analyzes excel checklists and digitizes them with well defined options.
The auditor can create notes on the tablet like they originally did on paper and the system processes the notes get the findings.
- Make everything realtime and actionable
The captured audio during the audits is immediately translated once the audit is complete. The audit report is ready to be shared the moment the auditor leaves the facility. An audit summary is generated with the strengths and weaknesses of the supplier using AI.
The supplier is notified about all the non-conformances and can start working on corrective actions immediately. The system also assigns clear ownership and timelines for each action item, and tracks progress.
This drastically reduces the time from several days to just minutes.

Key Challenge
Our biggest challenge was making the whole system work in an ambient manner. Letting the user focus on what they observe and note it. Anything that's missed out should get captured by the system. This led to us building the audio capturing system.
What we had to tackle here is - audits happen in noisy environments and people speak in different languages. And audits are super long, most often the whole day.
Capturing and processing 6-7 hours of audio was a bad idea. We knew that the only information we cared about is when the auditors are talking. Not the machine noise or when there are long periods of silence. This was a perfect use case for a voice activity detection system.
This reduced the total audio length by more than 80%, also saving us processing bandwidth.
What changed after this
The biggest difference wasn't just speed. It was behavior.
- Audits became more consistent
- Issues were addressed faster
- Teams had better visibility
- There was accountability in closing the loop
It shifted audits from being a periodic activity to something that continuously improves operations.
What I took away from this
This project reinforced something I keep coming back to.
Good software isn't about adding more features. It's about removing friction from real-world workflows.
Spending time on the shop-floor, understanding how people actually work, and then building around that made all the difference here.
More on this can be found here.