Healthcare AI has primarily focused on image analysis and disease diagnosis. Existing datasets have only contained static image information, failing to reflect the ‘actions’ necessary in real-world medical settings, such as robot movement, force control, and real-time feedback. To overcome these limitations and advance the field of healthcare healthcare robotics, it was essential to build open datasets that include diverse data such as robot movements, sensor data, and actual surgical footage.
Open-H-Embodiment, born from collaboration between institutions including NVIDIA, Johns Hopkins University, and Technical University of Munich, is expected to bring revolutionary changes to the healthcare robotics field. This dataset goes beyond simple data provision; it lays the foundation for Physical AI research and contributes to providing safer and more efficient healthcare services.
Open-H-Embodiment is a community-based dataset project built with participants from various institutions. Prominent experts such as Professor Axel Krieger (Johns Hopkins), Professor Nassir Navab (Technical University of Munich), and Dr. Mahdi Azizian (NVIDIA) are leading this project, with currently over 35 institutions participating. This multi-institutional collaboration has secured data for a variety of robot platforms and surgical environments, enhancing the dataset’s diversity and utility.
Open-H-Embodiment is a result co-created by researchers in the healthcare robotics field, and it publicly releases 778 hours of CC-BY-4.0 license data, along with two new models, GR00T-H and Cosmos-H-Surgical-Simulator, to assist researchers.
GR00T-H is a Vision-Language-Action (VLA) model based on NVIDIA’s Isaac GR00T N series. It was trained for approximately 600 hours using the Open-H-Embodiment dataset and is the first policy model specifically tailored for surgical robot tasks. GR00T-H leverages NVIDIA’s open ecosystem, using Cosmos Reason 2B as the VLM (Vision-Language Model) backbone. This model will play a critical role in improving the accuracy and efficiency of robots in the healthcare robotics field.
GR00T-H incorporates innovative design elements such as unique Embodiment Projectors, State Dropout, Relative EEF Actions, and Metadata in Task Prompts to overcome the limitations of traditional imitation learning and improve performance in real-world environments. It notably demonstrated outstanding performance by successfully performing complete suturing in the SutureBot benchmark.
Cosmos-H-Surgical-Simulator is a World Foundation Model (WFM) for action-conditioned surgical robots. Existing simulators have limitations in accurately reflecting the complexities of real-world surgical environments. For example, they have failed to consider various factors such as tissue movement, light reflection, blood, and smoke. Cosmos-H-Surgical-Simulator addresses these issues by being fine-tuned based on NVIDIA Cosmos Predict 2.5 2B, generating physically realistic surgical videos and providing a simulation environment that is highly similar to the real world.
Cosmos-H-Surgical-Simulator simulates 600 rollouts in just 40 minutes, significantly reducing the time required for benchtop methods, which typically take two days in a real-world environment. This will significantly contribute to improving research and development efficiency in the healthcare robotics field. Furthermore, this simulator implicitly learns tissue deformation and tool interaction, enabling it to generate data that is even more similar to the real surgical environment.
The next step in the Open-H-Embodiment project is to secure autonomy based on reasoning capabilities beyond perceptual control. This aims to build innovative systems similar to healthcare’s ChatGPT. To achieve this, Open-H-Embodiment needs to expand the data required for reasoning by including annotated task traces that incorporate intention, outcomes, and failure modes.
Active participation from the community is needed for these efforts, and we can co-create the future of healthcare robotics through the GitHub repository. Through these advancements, AI-based robots will be able to explain surgical procedures, plan them, and adapt to changing environments, providing safer and more efficient healthcare services. This dataset will serve as a significant milestone in illuminating the future of healthcare robotics.
Various resources are provided to enable researchers to begin their work by utilizing the Open-H-Embodiment dataset and models. Easy access is available through the GitHub repository, HF models, and Cosmos Cookbook, and you can explore and utilize various models on Hugging Face and NVIDIA build.com. Join us in advancing the field of healthcare robotics and leading innovation in the future of medical services.
Array
Original Source: The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics
Harnessing AI with LangChain DeepAgents and LangSmith: Ensuring Reliability and Consistency in AI Systems Introduction:…
Getting Started with Machine Learning: Where Should You Begin? Many people feel that the term…
Open Dataset and Foundational Physical AI Models for Healthcare Robotics Released Opening a New Horizon…
Mistral Small 4: 119B Parameter MoE Model Unifying All Features Mistral Small 4: Chatbots, Reasoning,…
## Transformer Models are Hitting Performance Limits? Attention Residuals Offer a Solution! 😎 Over the…
IBM Granite 4.0 1B Speech: Lightweight Multilingual Speech Model IBM Granite 4.0 1B Speech: Lightweight…