On-Device ML on Android: TensorFlow Lite / LiteRT
What is TensorFlow Lite (LiteRT)?
TensorFlow Lite — recently renamed LiteRT — is Google’s runtime for running ML models on-device, on Android (and iOS, and microcontrollers). You ship a small .tflite model file inside your app and run inference locally, with no server and no internet.
Why it’s great for mobile
- Optimised to be small and fast on phones.
- Can use hardware acceleration (GPU, and the NPU via delegates) for speed and battery savings.
- Works fully offline and keeps data private.
The typical flow
- Add the dependency and bundle a
.tflitemodel inassets/. - Load the model into an
Interpreter. - Pre-process your input into the expected tensor.
- Run inference.
- Read and interpret the output.
// build.gradle: implementation("org.tensorflow:tensorflow-lite:2.x")
val model = FileUtil.loadMappedFile(context, "model.tflite")
val interpreter = Interpreter(model)
// input/output buffers sized to the model
val input = preprocessImage(bitmap) // e.g. 1x224x224x3 floats
val output = Array(1) { FloatArray(NUM_CLASSES) }
interpreter.run(input, output) // inference
val topLabel = labels[output[0].indices.maxByOrNull { output[0][it] }!!]
The easier path: Task Library & Model Maker
Writing raw tensor code is fiddly. Google’s TensorFlow Lite Task Library wraps common jobs (image classification, object detection, text) in simple, high-level classes — far less code and fewer pre-processing bugs.
val classifier = ImageClassifier.createFromFile(context, "model.tflite")
val results = classifier.classify(TensorImage.fromBitmap(bitmap))
// results -> list of labels with scores
Hardware acceleration with delegates
A delegate offloads work to faster hardware. The GPU delegate (and NNAPI/NPU on supported devices) can make inference several times faster and gentler on the battery.
val options = Interpreter.Options().apply { addDelegate(GpuDelegate()) }
val interpreter = Interpreter(model, options)
Keeping it smooth
Run inference off the main thread (a coroutine or background executor) so the UI never freezes — especially for video/camera frames where you process many images per second.
Common mistakes
- Wrong input size/normalization (garbage output) — match the model spec.
- Running inference on the main thread (janky UI).
- Bundling a huge model that bloats the app — optimise it (later lesson).
Summary: TensorFlow Lite/LiteRT runs ML models on-device on Android. Bundle a .tflite file, use the Task Library for common jobs, accelerate with GPU/NPU delegates, and always run inference off the main thread.