Covering Scientific & Technical AI | Sunday, December 22, 2024

NVIDIA Showcases Advancements in Visual GenAI at 2024 CVPR 

NVIDIA, a global leader in GPU and AI technology, is making rapid advancements in the field of visual generative AI. The company’s researchers are exploring new technologies to create and interpret visual content, such as images, videos, and 3D models. 

Using machine learning models and advanced image processing techniques, GenAI can generate new visual data that is indistinguishable from content created by humans. NVIDIA is showcasing more than 50 of its visual GenAI projects at the 2024 Computer Vision and Pattern Recognition (CVPR) conference, taking place in Seattle, WA, from June 17th to 21st.

CVPR, organized by the IEEE (Institute of Electrical and Electronics Engineers), is regarded as one of the most significant and prestigious conferences in the fields of computer vision and pattern recognition. 

NVIDIA’s visual GenAI research covers a wide range of applications including domain-specific innovations for industries including healthcare, autonomous vehicles, and robotics. Two of NVIDIA’s projects, one focusing on the training dynamics of diffusion models and the other on high-definition mapping for autonomous vehicles, have been chosen as finalists for CVPR’s Best Paper Awards. 

“Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement,” said Jan Kautz, vice president of learning and perception research at NVIDIA. “At CVPR, NVIDIA Research is sharing how we’re pushing the boundaries of what’s possible — from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.”

Building on last year’s win in 3D Occupancy Prediction, NVIDIA won this year's CVPR Autonomous Grand Challenge for End-to-End Driving, outperforming more than 450 entries from around the globe. This milestone demonstrates NVIDIA’s pioneering work in using AI for developing autonomous self-driving vehicle models. The achievements of NVIDIA in this project earned it a CVPR Innovation Award.

At CVPR, NVIDIA also introduced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

One of NVIDIA’s standout papers, JeDI, was also showcased at the event. This paper proposes a new technique that allows users to easily personalize the output of diffusion models in just a few seconds using reference images. Researchers from Johns Hopkins University, Toyota Technological Institute, and NVIDIA collaborated on this paper to develop a model that significantly outperforms existing fine-tuning models. This breakthrough can help users create specific character depictions or product visuals. 

NVIDIA researchers also presented the FoundationPose, a unified foundation model for object pose estimation and tracking. This model can use a small set of reference images or a 3D representation of an object to understand its shape and to predict how the object moves and rotates in 3D, without the need for fine-tuning. The findings of this research could play a key role in further advancements in autonomous robots and augmented reality applications. 

Developed by researchers from the University of Illinois Urbana-Champaign and NVIDIA, NeRFDeformer was also showcased at the CVPR. The NeRFDeformer uses a novel method to edit the 3D scene captured by a Neural Radiance Field (NeRF) using a single 2D snapshot, rather than having to manually redefine how the scene has transformed or recreate the NeRF from scratch. This advancement holds significant potential for applications that rely on dynamic 3D modeling.

In collaboration with the Massachusetts Institute of Technology (MIT), NVIDIA also introduced VILA, a state-of-the-art visual language model (VLM) that can understand and process both images and text. VILA significantly improves upon existing VLMs by addressing several limitations including slow inference speeds, lack of in-context learning, and use of only single images. 

As many as a dozen papers by NVIDIA at the CVPR focused on autonomous vehicle research. Some of the other prominent papers presented by NVIDIA at 2024 CVPR included the largest-ever indoor synthetic dataset for the AI City Challenge. This will help in the development of smart city solutions and industrial automation. 

Related Items 

Google Extends Vertex with More GenAI Features 

DataRobot ‘Guard Models’ Keep GenAI on the Straight and Narrow 

Anthropic Launches Tool Use, Making It Easier To Create Custom AI Assistants

AIwire