Fujitsu and Carnegie Mellon University Develop AI-powered Social Digital Twin Tech with Traffic Data from Pittsburgh
TOKYO, March 7, 2024 -- Fujitsu Limited and Carnegie Mellon University today announced the development of a new technology to visualize traffic situations, including people and vehicles, as part of joint research on social digital twins that began in 2022. The technology transforms a 2D scene image captured by a monocular RGB camera into a digitalized 3D format using AI, which estimates the 3D shape and position of people and objects enabling high-precision visualization of dynamic 3D scenes.
Starting February 22, 2024, Fujitsu and Carnegie Mellon University began conducting field trials leveraging data from intersections in Pittsburgh, Pennsylvania, to verify the applicability of this technology.
This technology relies on AI that has been trained to detect the shape of people and objects through deep learning. This system is composed of two core technologies: 1) 3D Occupancy Estimation Technology that estimates the 3D occupancy of each object only from a monocular RGB camera, and 2) 3D Projection Technology that accurately locates each object within 3D scene models.
By utilizing these technologies, images taken in situations in which people and cars are densely situated, such as at intersections, can be dynamically reconstructed in 3D virtual space, thereby providing a crucial tool for advanced traffic analysis and potential accident prevention that could not be captured by surveillance cameras. Faces and license plates are anonymized to help preserve privacy.
Going forward, Fujitsu and Carnegie Mellon University aim to commercialize this technology by FY 2025 by verifying its usefulness not only in transportation but also in smart cities and traffic safety, with the aim of expanding its scope of application.
In February 2022, Fujitsu and Carnegie Mellon University’s School of Computer Science and College of Engineering began their joint research on Social Digital Twin technology, which dynamically replicates complex interplays between people, goods, economies, and societies in 3D. These technologies enable the high-precision 3D reconstruction of objects from multiple photographs taken from videos shot from different angles.
However, as the joint research proceeded, it was found that existing video analysis methods were technically insufficient to dynamically reconstruct captured images to 3D. Multiple cameras were required to reproduce this, and there were issues with privacy, workload, and cost, which became a barrier to social implementation.
To address these issues, Fujitsu and Carnegie Mellon University have developed a technology that reconstructs a dynamic 3D scene model even when an object is photographed from a stationary monocular RGB camera, without combining images shot simultaneously by multiple cameras.
This system consists of the following two core technologies.
- 3D Occupancy Estimation Technology: This technology leverages deep learning networks, which take multiple images of a city taken from various angles and distinguish the types of objects such as buildings and people reflected in the images. By using this model, even a single image of a city from a monocular RGB camera can be expressed as a collection of Voxels (2) in 3D space, including categories such as buildings and people. Such voxel representation of real world along with object semantics gives a detailed understanding of the scene to analyze occurrence of events. In addition, our method enables accurate 3D shape estimation of areas that are not visible in the input image.
- 3D Projection Technology: This technology creates a 3D digital twin based on the output results of 3D Occupancy Estimation Technology. By incorporating know-how in human behavior analysis, it is possible to exclude human movements that cannot occur in the real world, such as when a person passes through an object, and map them with high precision in 3D virtual space. This not only makes it possible to reconstruct the movements of people and vehicles in a manner more consistent with the real world, but also enables accurate position estimation even when specific parts of objects are hidden by obstructions.
"This achievement is the result of collaborative research between Fujitsu's team, Prof. Sean Qian, Prof. Srinivasa Narasimhan, and my team at CMU," said László A. Jeni, assistant research professor at Carnegie Mellon University. "I am delighted to announce it. CMU will continue to advance research on cutting-edge technologies through this collaboration in the future."
About Fujitsu
Fujitsu’s purpose is to make the world more sustainable by building trust in society through innovation. As the digital transformation partner of choice for customers in over 100 countries, our 124,000 employees work to resolve some of the greatest challenges facing humanity. Our range of services and solutions draw on five key technologies: Computing, Networks, AI, Data & Security, and Converging Technologies, which we bring together to deliver sustainability transformation. Fujitsu Limited (TSE:6702) reported consolidated revenues of 3.7 trillion yen (US$28 billion) for the fiscal year ended March 31, 2023 and remains the top digital services company in Japan by market share. Find out more: www.fujitsu.com.
About Carnegie Mellon University
Carnegie Mellon is a private, internationally ranked research university with acclaimed programs spanning the sciences, engineering, technology, business, public policy, humanities and the arts. Our diverse community of scholars, researchers, creators and innovators is driven to make real-world impacts that benefit people across the globe. With an unconventional, interdisciplinary and entrepreneurial approach, we do the work that matters.