Sensat News

I touched grass and found the future of digital twins: An AI experiment with 3D models

August 14, 2025

Who’s this guy?

Introducing the author: An R&D Engineer at Sensat Labs

My official title has words like 'Labs' and 'Engineer' in it, which is a fancy way of saying I get to play with interesting data problems and prototype new ideas for Sensat’s platform. A key part of this work is exploring how emerging AI technologies can be applied to our products. This process has already yielded tools like Orion, our AI-powered visual search engine, currently in research preview, which originated from this kind of exploratory research.

But the work of a Labs engineer is never done and you’re only as cool as your latest project. So I have to find the next technology that can deliver a step-change in value for our users and the industries we work with.

The 'Aha!' Moment

Discovering Meta's VGGT for 3D reconstruction

While reviewing recent publications, a paper from Meta called VGGT: Visual Geometry Grounded Transformer caught my eye. The results presented were impressive. The model generated highly detailed 3D scenes from a small number of images, a significant leap in quality and efficiency compared to many existing techniques.

I ran a few tests with some drone footage I could quickly get my hands on (thank you Dronestock), and the potential became immediately clear. It was possible to generate a point cloud with a level of detail not a million miles from a dedicated scan, but from simple video or photos.

The processed point cloud, demonstrating how AI can achieve a level of detail comparable to dedicated scanning equipment from simple video

Where could I use this?

Merging high-fidelity models with Digital Landscapes

At Sensat, our Digital Landscapes are built using Aerial Imagery and have high resolution for very large areas, but lack fidelity at resolution below 25cm. This new approach suggested a powerful complementary workflow. Our large-scale Digital Landscapes provide a complete, site-wide model at a resolution perfect for planning and oversight and then a user on the ground could pull out their phone, capture a 30-second video of a specific asset, an entrance, a new piece of machinery, critical signage and instantly stitch a high-resolution, 3D model of it directly into the master plan.

This would effectively combine the macro and micro view into a single, dynamic environment. The idea was compelling enough to warrant a proper field test.

An epic expedition

My field test: Capturing a 3D model with a smartphone

To test my theory, I needed a target. Luckily, I already had a Digital Landscape of an area near my home in beautiful Hitchin. This provided the perfect baseline to test against.

So, I packed my bags, grabbed some snacks and applied a thick coat of suncream for the treacherous five-minute drive to a local church I thought would be a nice, landmark subject to capture.

I spent about ten minutes on-site, taking a few basic photos and a couple of short videos with my phone, nothing longer than a minute. The goal was to simulate a realistic user capture session: quick, simple, and with non-specialist hardware. I probably looked like a very confused tourist, who had a particular fascination with the church entrance…

Footage acquired. Mission accomplished. Time to retreat.

From raw footage to chef-kiss model 😙🤌

Back in the safety of my office, I could get to the real work: turning my bad, Blair Witch-esque shaky-cam footage into a beautiful 3D point cloud.

My initial tests showed that simply feeding all available video frames into the model was not an optimal approach. The key to a quality result is a curated input, less is more.

Borrowing techniques from Gaussian Splatting and NeRF workflows, I developed a pre-processing pipeline to select the best frames. This involved two main criteria:

Sharpness: I used a Laplacian variance check to automatically filter out and discard blurry or low-quality frames.
Distribution: I then selected a spatially diverse set of frames. The goal is a well-distributed set of images with enough parallax (difference in position) for the model to understand depth and geometry.

I found that a set of 10-30 high-quality, well-distributed images provided the ideal balance of detail and processing efficiency. The result was a high-fidelity point cloud of the target area, generated entirely from a few moments of video.

The final result: a dense and accurate point cloud of the church entrance, generated entirely from a handful of video frames

More video I took on my phone of the entrance

Transforming the raw, blurry video frames from the previous into a structured, high-resolution 3D reality model

Now what? Georeferencing and integration

Creating a lovely point cloud is one thing, but making it genuinely useful is another. This little experiment opened up a whole new set of fascinating questions for us to solve.

How do we get this new high-detail model to sit perfectly alongside our existing Digital Landscapes? A key detail in Sensat’s platform is that our data is always georeferenced accurately.

Do we brute-force it with point cloud registration algorithms? Do we bake extra metadata into the capture process from the phone's GPS and other sensors? Or use something like a Visual Positioning System (VPS) to get a precise location in real-time?

Ooooh, I'll never tell. You'll have to wait for the next blog post on the topic.

Frequently Asked Questions

Q: What is a digital twin?
- A: A geospatial digital twin is a dynamic, spatially accurate digital representation of a physical site, system or environment, enriched with real-world data and continuously updated to reflect changes over time. At Sensat, we create these twins for large-scale infrastructure projects, enabling teams to visualise, plan, and collaborate on their projects with a shared, data-driven view of reality.
Q: Can you really create a 3D model from a phone video?
- A: Yes. Using emerging AI models like VGGT and processing techniques like those used in Gaussian Splatting, it's possible to generate high-fidelity 3D point clouds from a short video captured on a standard smartphone.
Q: What is georeferencing?
- A: Georeferencing is the process of assigning real-world coordinates to a map or model. It ensures that data, like a new 3D scan, can be accurately placed within a larger model, like one of Sensat's Digital Landscapes.

Conclusion: The future of on-demand reality capture

This little adventure shows that when you let your engineers play outside, it’s not the worst thing in the world. They might just come back with more than a sunburn; they might find a whole new way to look at reality.

Just don't forget the shades. The world, it turns out, doesn't have a dark mode.

‍