A Community for Synthetic Data is Here and This is Why We Need It
The first open-source platform for synthetic data is here to help educate the broader machine learning and computer vision communities on the emerging technology.
Synthetic data is a promising technology and is in its early adoption phase. To bridge to mainstream adoption, the research community needs a place where they can learn about it, discuss the latest innovations and experiment.
I’m happy to announce OpenSynthetics.com, an open community for creating and using synthetic data in computer vision and machine learning (ML).
Synthetic data is computer-generated image data that models the real world. In the visual domain, synthetic data has shown promise in creating more capable and ethical AI models. By creating a centralized hub for datasets, papers, code, and resources, we aim to bring together researchers from industry and academia to advance state-of-the-art synthetic data.
The next generation of computer vision will be powered by synthetic data. Over the last few years, synthetic data has emerged as a disruptive new paradigm to train AI models. Through visual effects (VFX), neural rendering, and generative AI technologies, researchers have demonstrated the ability to build vast amounts of photorealistic, diverse, and perfectly labeled data sets faster and at decreased costs. This will enable more capable models for autonomous vehicles, robotics, drones, AR/VR/metaverse, generated media, and many more applications spanning from consumer to medical use-cases.
Current computer vision models require vast amounts of human-annotated data to help cameras identify what they’re seeing. This is time and labor-intensive, making it prohibitively expensive, and it also has significant shortcomings. It’s difficult for humans to interpret key data attributes, such as the 3D position of an object or its interactions with its environment.
Additionally, the inability to capture sufficiently diverse and balanced datasets often leads to bias, which has significant ethical implications in human-centered systems. Furthermore, increasing regulatory scrutiny and consumer privacy concerns make collecting and leveraging images of people complicated.
With synthetic data approaches, information about every pixel in the scene is explicitly defined. Pixel-accurate labels that were not previously available for 3D landmarks, depth, material properties, surface normals, sub-segmentation, and more are now available. Furthermore, the data and labels can be provided on demand, allowing ML practitioners to experiment and iterate orders of magnitude faster than was previously possible in a true data-centric paradigm. Synthetic data also addresses critical ethical issues by reducing bias, preserving privacy, and democratizing access to data.
The timing is perfect, and demand is here. We are at an inflection point for synthetic data:
- The first book on Synthetic Data for Deep Learning (link) was released in 2021;
- Gartner predicts that synthetic data will be 10x the volume of real data in the coming years;
- MIT Tech Review noted (link) synthetic data as one of the top 10 breakthrough technologies of 2022.
As more and more researchers become interested in synthetic data, OpenSynthetics will serve as a powerful reference to help educate the broader community.
Why Contribute And Participate?
Synthetic data represents a paradigm shift for training computer vision models, but it is also the gateway technology to build more generalized intelligence. Moving forward, researchers will increasingly leverage these digital worlds to build AI models that deeply understand and are capable of interacting and manipulating the world around them.
OpenSynthetics will bring together researchers and practitioners across academia and industry in an open and collaborative community to help propel the space forward. We believe synthetic data will come to power the next generation of computer vision and that together we can help catalyze innovation. By contributing and participating in the site, the community will actively build the knowledge base to help grow the understanding and drive adoption of this emerging technology. We hope you will join us to create a thriving OpenSynthetics community.
Yashar Behzadi, Ph.D. is the CEO and Founder of Synthesis AI. He is an experienced entrepreneur who has built transformative businesses in AI, medical technology, and IoT markets.