Workforce Management Featured Article
TII Intros Arabic NLP Model NOOR
Artificial Intelligence (AI) is quickly becoming a key technology in healthcare, finance, manufacturing and even advertising industries. With improvements in deep learning, there have been giant strides made in the areas of image recognition, speech recognition, natural language processing, audio recognition, social network filtering, machine translation and even medical image analysis.
Now, Technology Innovation Institute (TII), a research pillar of Abu Dhabi’s Advanced Technology Research Council, said it has created the largest Arabic natural language processing (NLP) model to date. The new model, NOOR, is able to carry out varied, cross-domain tasks using natural language instructions.
NOOR was built by TII’s (News - Alert) team of advanced researchers and Artificial Intelligence (AI) specialists together with LightOn, a tech company that unlocks extreme-scale machine intelligence for businesses.
The word NOOR is the Arabic word for "light" and was chosen to represent how the model enlightens the mind.
“With this development, we are on track to boost our research capabilities and credentials in AI, as well as elevating the status of Abu Dhabi and the UAE as a serious research ecosystem. Our expert teams have demonstrated yet again that this region can achieve breakthrough R&D outcomes that impact the world,” said Dr. Ray O. Johnson, CEO, TII and ASPIRE.
For the model, which is based on the popular Transformer architecture, TII first built an end-to-end pipeline with crawling, filtering, and curation at scale, to collect the data. Automated filtering helps identify text data and safeguard the model from spam.
TII also created optimized services for extreme-scale distributed training and serving for applications with efficient inference and model specialization. The training dataset uses high-quality cross-domain Arabic dataset and information from web data, books, poetry, news articles, and technical information to significantly widen the applicability.
The model, which is decoder-only, is similar to the open autoregressive language model GPT-3 and uses 3D parallelism as well as efficiently uses available hardware resources.
Commenting on the launch, Dr. Ebtesam Almazrouei, Director, AI Cross-Center Unit, TII, said, “Large language models have taken the world of natural language processing by storm, and we are proud to introduce this cutting-edge model with 10 billion parameters – the world’s largest Arabic NLP model.”
Edited by Erik Linask