Venture capitalist Tom Loverro predicted an impending mass extinction event for early and mid-stage companies last year, forecasting that the years 2023 and 2024 would surpass the impact of the 2008 financial crisis on startups. According to PitchBook, over 3,200 startups worldwide closed last year, resulting in more than $27 billion in lost investment funds. Despite this global crisis, South Korean startups are garnering attention for their technological prowess, acknowledged by major domestic IT giants and honored with the CES Innovation Award. ChosunBiz conducted interviews with some of these startups to explore their distinguishing factors. [Editor’s Note]
On Feb. 16, Microsoft-backed OpenAI made waves in the global tech industry with the introduction of its new video-generation model, ‘Sora’. Amid the excitement surrounding this unveiling, Gaudio Lab, a South Korean startup founded in 2015 by former LG Electronics audio engineer Henney Oh, is actively pursuing business opportunities. Gaudio Lab has gained prominence domestically for its AI-driven audio source separation and spatial audio technology. In 2022, they successfully extracted actor Choi Min-sik’s voice from a 1990s drama to replicate his younger voice in the Netflix series ‘Big Bet’.
However, Gaudio Lab’s ambitions extend beyond their current achievements. They’ve identified a collaboration opportunity with OpenAI, aiming to integrate their AI sound generator, ‘FALL-E’, with OpenAI’s models. FALL-E, akin to AI image generators using Diffusion Models, extracts meaningful signals from white noise. Gaudio Lab aims to advance related technologies, enabling AI to analyze videos and create foley sound (everyday sounds that weren’t captured – or successfully captured – on set and therefore added in post-production to enhance audio quality) that seamlessly fits with scenes.
Already engaged in discussions with Microsoft, Gaudio Lab’s collaboration prospects were initiated by Satya Nadella, CEO of Microsoft, visiting their booth during CES 2024. After their presentation, Nadella reportedly noted the similarities between FALL-E and DALL-E, OpenAI’s text-to-image model.
Since 2021, Gaudio Lab has been researching sound generation using AI, backed by a Series B investment of 16.9 billion won from various investors, including Naver D2SF, Samsung Venture Investment, SoftBank Ventures, LB Investment. This initiative, known as the ‘Sound Studio Gaudio (SSG)’ project, led to their first AI sound generation in June 2022. They’ve progressed to a point where they can generate sound across roughly 100 categories by inputting prompts. In 2021, Gaudio Lab also acquired WaveLab, a prominent film sound studio in South Korea, providing access to over 20 years’ worth of pristine and natural high-quality sound data.
The newly launched ‘Just Voice’ app will be the next integral part of the SSG project, integrated with both FALL-E and ‘Gaudio Spatial Audio (GSA),’ a spatial audio technology recognized at CES 2023. The app employs AI technology to extract voices from noisy environments in real-time by eliminating background noise, earning Gaudio Lab its second CES Innovation Award. Anticipating that B2C products like Just Voice will attract more B2B customers, Gaudio Lab envisions enriching its business portfolio.
In an interview with ChosunBiz on Feb. 19, Oh elaborated, “As our company approaches its 10th year, it’s evident that our current revenue isn’t enough to further develop FALL-E. Our strategy is to leverage our existing technologies to generate revenue, with the aim of achieving sustainable profitability. Once we reach this milestone, we plan to smoothly proceed with our planned IPO.” Gaudio Lab targets a revenue of 7.7 billion won this year and plans to go public in May next year, aligning with its founding anniversary.
In the initial stages of your startup journey, it’s known that your aim was to tap into the metaverse market. However, facing slower-than-expected growth posed challenges. Amid consideration of various new business ideas, could you share the rationale behind choosing generative AI, a concept that was relatively unfamiliar to many at the time?
“Around July 2021, during the Series B investment acquisition process, a VC posed the question: “How can Gaudiolab implement the ‘missing piece’ of the metaverse?” In fact, most metaverse platforms lack their own distinct ‘soundscape.’ While background music may play as users move characters, and sound effects may occur with specific character actions, there’s still much to be filled. To enhance realism, I thought a process similar to how foley artists create sound for each scene in film production would be necessary. At that time, we were already implementing audio source separation and spatial audio technology using AI. Naturally, as we contemplated this new challenge, it became clear that leveraging AI was the solution. Considering the maturity of AI technology at the time, we planned the SSG project on the assumption that “with enough data, sound generation could be possible.” While several other business ideas were considered, none matched the scale and potential for diversification like this project did.”
What’s the current level of completion for FALL-E?
“FALL-E’s ultimate goal is to generate sound that perfectly aligns with scenes in videos. But we haven’t quite achieved that yet, so let’s say it’s at 0% [laughs]. Nonetheless, we’re steadily progressing through the necessary steps. Over the past two years, we’ve refined the technology to generate sound based on input text, which is now about 80-90% complete. However, when considering if the generated sound meets our standards, we’d have to reduce that figure to around 50%.
Since last year, we’ve integrated a feature into FALL-E to generate sound corresponding to input images, which is currently about 50% complete. This year, we plan to add the capability for FALL-E to generate sound based on input videos, and we anticipate it reaching a reasonably usable level. We believe that with the current level of technology, FALL-E is sufficiently poised to enter the market. As the technology continues to advance, we’re confident it will achieve 100% completion.”
How large is the dataset that FALL-E has been trained on?
“There are approximately 11 million files, equivalent to about 12,000 hours in time, and around 10 terabytes in data storage capacity. However, it’s still not enough. So, we are expanding our dataset by extracting data using our audio source separation technology, as well as entering into a contract with a related company, the name of which I can’t disclose.”
Are there any use cases of FALL-E?
“Last year, we partnered with an audio content company called Audien to integrate sounds generated by FALL-E into audiobooks. We’re also utilizing FALL-E to produce sounds for scenes in movies and other video content through our subsidiary studio. Currently, we are in discussions with one of our investment partners regarding the commercialization of a service using FALL-E. However, since we are still in the very early stages, I’m afraid I can’t share specific details about which service will be launched and when.”
Do you plan to develop services similar to ChatGPT that are accessible to the general public?
“That’s the direction we’re heading in. But instead of creating a standalone service, we expect it’ll be integrated into existing ones. For sound generation, AI requires a good understanding of language, which is where LLM comes in. From a user perspective, integrating FALL-E into established services like ChatGPT is also much more convenient. Switching between multiple services could be hectic.”
As of now, Gaudio Lab stands unrivaled worldwide in the field of sound generation technology. What, in your opinion, is essential for driving the company’s growth further?
“It seems we’ll need more GPUs and experts to train AI on large datasets. However, due to the constraints of startups, there are limits to this approach. Thus, collaborating with Big Tech seems like the best solution. Most Big Tech companies currently lack a significant workforce in sound-related fields. Gaudio Lab is one of the few places where talented individuals with doctoral degrees in sound engineering come together. Hopefully, by maintaining our current leading position, we can establish valuable partnerships when the timing is right.”
Do you intend to seek additional investment?
“While attracting investment is important, we prioritize strategic partnerships over purely financial ones. Expanding on this, I personally believe that going public is the best option for funding. Our plan is to focus on achieving profitability through audio source separation and spatial audio technologies before considering an IPO.”