Varco Vision, a multimodal artificial intelligence (AI) model developed by South Korea’s gaming company NCSoft, analyzed an image of a traditional Korean hanok house in winter. The model described the image as having “distinctive eaves and tiled roofs of a hanok,” accurately identifying it as a hanok and explaining its unique features. Varco Vision is a Korean-language-focused AI model that processes both text and images, offering a superior understanding of Korean culture compared to similar foreign models. Similarly, Motif, a large language model (LLM) launched by the Korean AI startup Moreh, demonstrates advanced Korean-specific capabilities. By learning not only from online content but also from specialized texts, Motif generates more natural Korean responses than OpenAI’s GPT-4, which is regarded as one of the best globally. AI startup Upstage is working with Law&Company, the operator of LawTalk, to develop Solar Legal, a legal AI model tailored to Korean laws. The competition among Korean companies to create Korean-specialized LLMs is intensifying as they aim to enhance their capabilities in the Korean language and culture.
LLMs are the foundation for AI to understand human language and provide natural responses. The issue is that most open-source LLMs are based on English and Chinese. An industry expert noted, “Global AI technology is advancing rapidly in English and Chinese, creating a widening gap,” adding, “If Korean-based AI falls behind, it could impact national competitiveness and lead to linguistic dependency.”
This is why Korean companies like NCSoft and Upstage have recently been releasing Korean-focused LLMs. The more users there are for Korean-specialized LLMs, the more data can be gathered, which is key for improving the models. Industry experts argue that instead of competing within the limited domestic market, Korean-language AI should first focus on surviving the competition with global AI players. Some companies are adopting open-source strategies, offering free access to their LLMs to attract more users and grow the ecosystem.
Korean companies are heavily investing in securing high-quality Korean-language data to differentiate themselves from global tech giants like OpenAI. This data is particularly crucial in specialized fields such as law and finance, where AI needs to understand complex questions in context and analyze related documents and images to provide natural responses. The key challenge is acquiring high-quality data. Global AI research institute Epoch AI reported that high-quality data sources, such as news articles and academic papers, were already depleted as of January this year. To address this, several Korean companies have formed strategic alliances. Last year, Upstage launched the “1T (1 Trillion Tokens) Club,” a coalition of organizations committed to providing over 100 million Korean words in formats such as books, articles, reports, and academic papers to combat the data shortage.