HardwareHomie Toolkit

Xperience-10M: The Largest
Human Xperience Dataset

A large-scale 4D human experience dataset spanning video, audio, depth, pose, motion capture, inertial sensing, and language annotation for physical AI.

Introduction

Xperience-10M captures multimodal human experience for learning world-grounded behavior.

Each sample is synchronized across vision, motion, spatial sensing, and language metadata. The dataset supports model pretraining, evaluation, and simulation transfer workflows.

Physical AI will not scale on synthetic trajectories alone. Xperience-10M captures how humans act, adapt, and interact in the real world, creating a foundation of experiential data for systems that must operate beyond simulation.

Key Stats

Signal Stack and Scale

01

0M+

Interaction Episodes

02

0 h

Video w/ Audio

03

0.00B

RGB Frame Number

04

0M

Depth Frame Number

05

0M

Camera Pose Number

06

0M

MoCap Frame

07

0.0B

IMU Frame

08

0M

Caption Sentences

09

0M

Caption Words

10

~1PB

Total Storage

11

0B

Model Parameters

12

0

Research Nodes

Dataset Overview

Dataset Overview

Why Xperience-10M

Xperience-10M is designed to model how intelligence unfolds in the physical world: perception coupled with action, intention, and consequence. Instead of isolated frames or synthetic trajectories, each record preserves temporally consistent, embodied human experience in real environments.


The release focuses on data quality and alignment as much as scale, so teams can use it for both foundation model pretraining and task-specific downstream adaptation.

// COLLABORATION

If you are building world models, robotics systems, or embodied agents and need large-scale, grounded human experience data, we would love to collaborate.

Share your model stage, target tasks, and data requirements. We can provide suitable dataset slices and discuss integration plans.

Contact Us