On-Device ML on Android: TensorFlow Lite / LiteRT

What is TensorFlow Lite (LiteRT)?

TensorFlow Lite — recently renamed LiteRT — is Google’s runtime for running ML models on-device, on Android (and iOS, and microcontrollers). You ship a small .tflite model file inside your app and run inference locally, with no server and no internet.

Why it’s great for mobile

Optimised to be small and fast on phones.
Can use hardware acceleration (GPU, and the NPU via delegates) for speed and battery savings.
Works fully offline and keeps data private.

The typical flow

Add the dependency and bundle a .tflite model in assets/.
Load the model into an Interpreter.
Pre-process your input into the expected tensor.
Run inference.
Read and interpret the output.

// build.gradle: implementation("org.tensorflow:tensorflow-lite:2.x")

val model = FileUtil.loadMappedFile(context, "model.tflite")
val interpreter = Interpreter(model)

// input/output buffers sized to the model
val input = preprocessImage(bitmap)          // e.g. 1x224x224x3 floats
val output = Array(1) { FloatArray(NUM_CLASSES) }

interpreter.run(input, output)               // inference
val topLabel = labels[output[0].indices.maxByOrNull { output[0][it] }!!]

The easier path: Task Library & Model Maker

Writing raw tensor code is fiddly. Google’s TensorFlow Lite Task Library wraps common jobs (image classification, object detection, text) in simple, high-level classes — far less code and fewer pre-processing bugs.

val classifier = ImageClassifier.createFromFile(context, "model.tflite")
val results = classifier.classify(TensorImage.fromBitmap(bitmap))
// results -> list of labels with scores

Hardware acceleration with delegates

A delegate offloads work to faster hardware. The GPU delegate (and NNAPI/NPU on supported devices) can make inference several times faster and gentler on the battery.

val options = Interpreter.Options().apply { addDelegate(GpuDelegate()) }
val interpreter = Interpreter(model, options)

Keeping it smooth

Run inference off the main thread (a coroutine or background executor) so the UI never freezes — especially for video/camera frames where you process many images per second.

Common mistakes

Wrong input size/normalization (garbage output) — match the model spec.
Running inference on the main thread (janky UI).
Bundling a huge model that bloats the app — optimise it (later lesson).

Summary: TensorFlow Lite/LiteRT runs ML models on-device on Android. Bundle a .tflite file, use the Task Library for common jobs, accelerate with GPU/NPU delegates, and always run inference off the main thread.