0M+
Interaction Episodes
A large-scale 4D human experience dataset spanning video, audio, depth, pose, motion capture, inertial sensing, and language annotation for physical AI.
Introduction
Xperience-10M captures multimodal human experience for learning world-grounded behavior.
Each sample is synchronized across vision, motion, spatial sensing, and language metadata. The dataset supports model pretraining, evaluation, and simulation transfer workflows.
Physical AI will not scale on synthetic trajectories alone. Xperience-10M captures how humans act, adapt, and interact in the real world, creating a foundation of experiential data for systems that must operate beyond simulation.
Key Stats
0M+
Interaction Episodes
0 h
Video w/ Audio
0.00B
RGB Frame Number
0M
Depth Frame Number
0M
Camera Pose Number
0M
MoCap Frame
0.0B
IMU Frame
0M
Caption Sentences
0M
Caption Words
~1PB
Total Storage
0B
Model Parameters
0
Research Nodes
Dataset Overview
Xperience-10M is designed to model how intelligence unfolds in the physical world: perception coupled with action, intention, and consequence. Instead of isolated frames or synthetic trajectories, each record preserves temporally consistent, embodied human experience in real environments.
The release focuses on data quality and alignment as much as scale, so teams can use it for both foundation model pretraining and task-specific downstream adaptation.
// COLLABORATION
Share your model stage, target tasks, and data requirements. We can provide suitable dataset slices and discuss integration plans.
Contact Us