One Photo to a 3D Splat — Apple's LiTo into WebXR
12:00 JSTLiTo (Surface Light Field Tokenization) is an Apple Machine Learning Research project, accepted to ICLR 2026, that generates a 3D object from a single input image — and unlike most image-to-3D methods, it captures view-dependent appearance: specular highlights, reflections, and Fresnel effects, with geometry and lighting cleanly separated. Its native output is a 3D Gaussian splat, which makes it a natural fit for the splat-to-XR pipelines we have been covering.
What it actually is
Be precise about the tool before building on it. LiTo is a PyTorch research codebase with two models — a point-cloud tokenizer and an image-to-3D diffusion transformer — plus pretrained checkpoints and a runnable FastAPI demo (repo, paper). It is not a Swift library, a Core ML model, or a visionOS component; the repository contains no Apple-platform code despite being published by Apple.
It is also GPU-heavy: roughly 5 seconds per image on an H100, ~160 seconds on an M4 Max. It runs on Apple Silicon, but it is a workstation/cloud model, not an on-device one.
Licensing: LiTo ships under Apple’s research sample-code license — not MIT/BSD/Apache — with separate terms for the model weights (LICENSE_MODEL) and generated samples (LICENSE_generated_samples). Treat output as research-use unless you have cleared the terms; do not assume it is free to ship.
The realistic pipeline: image → splat → WebXR
The real connection to XR runs through the output format, not through any Apple integration:
- Run the LiTo demo locally (Apple Silicon works, slowly) or on a rented NVIDIA GPU, feeding it a single image.
- Export the result as a Gaussian-splat
.ply— the reconstruction notebook saves PLY directly, and a community ComfyUI wrapper adds an explicit “export PLY” node for a single-image → splat flow. - Load that PLY in a WebXR splat renderer — SuperSplat to publish a headset-ready viewer, or Spark / Babylon.js for a code-level build. These run in the Quest 3 browser and in Vision Pro Safari (WebXR enabled in feature flags).
That is a concrete, buildable demo: a photograph becomes a splat you can walk around in a headset, with LiTo’s view-dependent lighting intact.
The Vision Pro caveat
Getting LiTo content natively into Apple Vision Pro is the hard, lossy part. RealityKit and PolySpatial have no native Gaussian-splat renderer, and LiTo provides no conversion. The two routes both have a cost:
- Third-party Metal splat renderer (e.g. MetalSplatter) — preserves the splat, but means Swift/Metal work outside Apple’s stock frameworks.
- Convert the splat to a textured mesh → USDZ — drops into RealityKit cleanly, but throws away the view-dependent lighting that is LiTo’s entire point.
So treat Vision Pro as a caveated stretch goal. The solid, demonstrable result is single image → LiTo splat → WebXR, which works in a headset browser today.
Caveats
- Not an Apple Vision Pro / Core ML tool. Being an Apple Research repo doesn’t make it AVP tooling — it is server-class PyTorch with no on-device path.
- Research license, not open-source. Check
LICENSE,LICENSE_MODEL, andLICENSE_generated_samplesbefore reusing output. - Heavy. Seconds on a datacenter GPU, minutes on a laptop — plan compute accordingly.
- Splat delivery applies as always: budget gaussian count for headset framerate, expect a possible orientation fix on import.
Useful links
- LiTo project page (interactive 3DGS demos) · Apple ML Research
- apple/ml-lito repository · paper (arXiv)
- ComfyUI-LiTo (single image → splat → PLY)
- SuperSplat · Spark · Babylon.js Gaussian Splatting
- MetalSplatter (native visionOS)
- Phone-captured Gaussian splats — Scaniverse into WebXR and Godot — the capture-based counterpart
- Image-blaster → engine → headset — another AI-asset-to-XR route
- Hackathon details — eligibility, team formation, AI policy
- Register on Luma
Questions? Reach the team via the Contact page.