AI Artificial Intelligence Business GenAI News OpenAI technology

OpenAI Comes Up With Revolutionary Model That Generates Multimedia Using AI In Fraction Of A Second

Dr. Hura Anwar9 hours ago

0 1 2 minutes read

Several researchers at OpenAI just published a new paper that described the creation of a revolutionary AI model. This one is designed to produce multimedia including images, video, and audio in nearly a tenth of a second.

The new kind of sCM model uses AI tech to produce multimedia 50 times faster than what’s seen by regular diffusion models. The latter can do the job in 5 seconds so you can imagine the contrast in speeds.

Rolling out sCM has always been on OpenAI’s agenda. Finally, the makers of ChatGPT confirmed that their results are compatible with the quality standards of the industry and it only uses two steps. So the entire generative process does not impact quality.

The matter was better discussed in a peer-reviewed article and also the firm’s own blog post. The innovation was applauded as it gave rise to top-notch quality samples at lightning speeds. Remember, previous models entailed hundreds of steps but now that’s limited to just two.

OpenAI researchers penned down in 2023 the arrival of consistency models as being on the same path to reach the same point as others like diffusion models in use today.

Diffusion models keep on delivering outstanding results in terms of real pictures, 3D models, and audio with video. However due to inefficiency in their sampling techniques, thanks to hundreds of steps involved, it’s not useful in real-time apps.

Older models use several denoising steps to produce samples that result in low speed. On the other hand, sCM transforms noise into high-quality samples within a few steps. This eliminates time and cost along the way. Today, the biggest sCM model from OpenAI has 1.5 billion parameters and can give rise to samples in a mere 0.11 seconds on just one A100 GPU.

This means speeds go up by 50x when compared to regular diffusion models so, in the end, AI apps working in real time get more feasible. As confirmed by the researchers behind the project, they trained the continuous models using ImageNew 512×512 which scaled up to 1.5B parameters.

Even at the scale, the model maintains sample quality that’s rivaled with the best diffusion models. So less computational effort is needed to get similar results. From the looks of it so far, the benchmarks displayed strong performance.