NC has also disclosed ‘FoCus Dataset(For Customized conversation dataset)’, which is an AI conversation dataset that could be interpreted and explained. Users may understand the source of AI’s learning data and how the data were collected and treated. Through this, users may confirm the source based on which AI is making decisions and understand AI models in a more complete way.
The strength of this dataset is that it could materialize a conversational technology that performs at the same level as the one using ultra-large language models without having to use such models. Ultra-large models require an enormous number of parameters, strong learning capacity, and extensive learning data. Therefore, they require much effort to collect data, and only large-sized companies may utilize them, since up to a few dozen billion KRW of expenses would occur per learning. However, the gap in research between large-sized and small-sized companies will widen even further, since small-sized ones cannot afford the expenses required for data collection or learning. However, in case a conversational technology that performs at the same level as the one using ultra-large language models without having to use such models could be materialized, expenses and efforts required for data collection may be reduced.
FoCus Dataset was introduced through a joint research project between NC and Korea University. The joint research team published and announced the paper in **AAAI 2022, the world’s best conference on artificial intelligence, and the team is running workshops to share research results and shared tasks. Also, the team was invited to ***COLING 2022 to be held in Gyeonggju in October on related topics and would give a lecture and a presentation about their paper. Even though it has not been directly applied to commercial services, the data are considered as a pioneer, since they could be safely used after removal of unethical expressions and personal information. NC has disclosed these data, since new conversation technologies are suggested in the NLP area due to concern about expenses and environment. NC plans to continue its efforts to actively participate in discussions and technology developments in academia.
**AAAI: Association for the Advancement of Artificial Intelligence
***COLING 2022: The International Conference on Computational Linguistics