Core ML & On-Device ML Consulting
Vision, Core ML, Natural Language, on-device voice. Models that run locally without the latency, privacy cost, or cloud bill of a server round-trip. I've built three of my own apps whose entire value is on-device inference.
- on-device Vision and Core ML pipelines with sub-100ms latency budgets
- PyTorch/TensorFlow → Core ML conversion, quantization, model size reduction
- on-device voice AI with ElevenLabs fallback for real-time conversations
Related work
What clients say
"Vadim was instrumental to the success Epsy enjoyed on iOS, taking it from an idea on a Miro board to the highest rated and most downloaded app of its kind on the store."
James C. · Mobile Engineering Lead, Epsy
"We had a strict deadline, and Vadim managed to complete the job in time. He gave us meaningful feedback and suggested better approaches, not trying to blindly stick to our specification."
Founder · Pre-seed streaming service
"I can say with confidence that it will be difficult to find a better developer. Vadim is achievement-oriented, highly organized, with very good communication skills."
Alex Z. · Co-Founder, eda.so
Common engagements
Integrate an existing model
2–4 weeks end-to-end. I write the glue code, run device-class testing from iPhone 12 through the current generation, design the update strategy (bundled vs fetched), and plan the fallback for when inference fails. The fallback matters more than the happy path.
Ship a PyTorch/TF model to Core ML
I convert the model via coremltools, debug the inevitable 'it converts but behaves differently' step, and add version-gate hygiene so the next iOS doesn't silently break inference.
Architect a new ML-backed feature
Product brief in, architecture out. I tell you whether on-device is the right call and what the fallback looks like.
Areas I cover
Pricing
Architecture reviews, hiring help, second opinions on that thing that's been bugging you.
Available nowFeatures, MVPs, migrations, firefighting. Minimum 5 days.
Available nowPriority support: review agency code, join architecture calls, catch problems before they ship.
Questions
How do I decide between on-device and server-side?
On-device wins when any of these apply: the feature must work offline, latency needs to be under ~100ms, per-inference cost matters, or you can't send the data off-device. If none apply, server-side is usually cheaper and more maintainable.
Will the model fit in our app?
Almost always yes. The sharper question is whether it still fits after the next three features on the roadmap. I review your binary budget and model options before you commit.
How do you handle model updates?
Two clean options: bake the model into the binary (every update is an App Store cycle) or fetch at runtime with integrity checks, caching, and a fallback path. Teams that pick neither and combine both end up with bugs neither approach would produce on its own.
How do I get a quote?
Two paths. If you need speed, send me a detailed brief and I'll quote from it (usually within 48 hours). If you'd rather talk first, book a free 30-minute scoping call and I'll quote after. Most clients who pick the brief path land on the call anyway once we get into the specifics, but the door is open either way.
How quickly can you start?
Advisory calls can happen within days. For project work, I typically need 1-2 weeks notice to clear the calendar, though I keep some buffer for urgent firefighting. Check the availability badges above for current openings.
Do you work with early-stage startups?
Yes, from pre-seed to Series C and beyond. For very early teams, the advisory tier often makes more sense than project work: you get architecture guidance without committing to a large engagement before you've validated the product.
What's included in the day rate?
Everything: code, architecture decisions, code review, documentation, async Slack availability during working hours. No surprise add-ons. I bill for time spent working on your project, not for "thinking about it in the shower."
How do you handle timezone differences?
Currently in Vancouver (PST) with full overlap for North American teams. For UK and Europe, I'm online by their afternoon. For Gulf or APAC, we'd agree on overlap hours and handle the rest async. I've worked with teams from San Francisco to Dubai.
Where I've worked CV · LinkedIn
Shipping an on-device ML feature?
Describe what you're working on, or book a free 30-min scoping call. I reply within 48 hours.
work@drobinin.com Book a free call →