Coral reefs face accelerating decline, demanding scalable monitoring tools that can operate with minimal labelled data. This report presents a pipeline that fuses dense semantic embeddings from the DINOv3 self-supervised foundation model with 3D Gaussian Splatting (3DGS) to produce a semantically queryable digital twin of a coral reef without any labelled training data. Two-dimensional validation on the Moorea Labelled Corals dataset confirms that DINOv3 embeddings capture meaningful semantic structure in underwater imagery, with the ViT-7b architecture demonstrating consistently superior discriminability over ConvNeXT-L across all evaluated metrics (two DINOv3 backbone architectures). Projecting these features into 3D via an occlusion-aware, confidence-weighted aggregation pipeline yields a semantically coherent 3D feature field, achieving a mean split-half cosine similarity of 0.899 across 2.8 million Gaussian primitives. Few-shot segmentation in 3D against manually labelled reference points reaches 88% of the human performance ceiling for binary coral versus substrate segmentation and 83% at fine class granularity using only ten reference points per class. Unsupervised clustering produces ecologically interpretable partitions at multiple resolutions, and a single-query semantic search tool enables interactive zero-shot 3D segmentation from a single pixel selection. Generalisability beyond a single-reef site remains to be validated, but these results establish DINOv3 and 3DGS as a viable foundation for label-efficient, large-scale coral reef monitoring.
Dense semantic features are extracted from survey imagery using DINOv3, compressed via PCA, and projected into an existing 3D Gaussian Splatting model through an occlusion-aware, confidence-weighted aggregation pipeline.
Special thanks to the creators of the wildflow/sweet-corals dataset for open-sourcing the high-quality underwater photogrammetry data used in this project. The visualisations above specifically use the Tabuhan P1 dataset. A big thank you also goes to my supervisors, Jeff Clark and Jess Jones, who provided invaluable support throughout this project.
If you find this work useful for your research, please consider citing: