Before the emergence of big models, artificial intelligence usually required the design of specialized algorithm models for specific tasks and scenarios, and could only perform single tasks within the scope of training data. The key to the breakthrough of big models is that they have demonstrated the "emergence" ability of human-like general intelligence, can learn knowledge in multiple fields, and handle multiple tasks, so they are also called general big models. Specifically, big models have the following characteristics:
Large parameter scale
Strong generalization ability
Support multi-modality
Large parameter scale
The parameters of large models are much larger than those of traditional deep learning models. The development of large models shows the characteristics of "scaling law", that is, there is a power law relationship between the performance of the model and the scale of the model, the size of the data set and the amount of computing used for training. The performance will increase linearly with the exponential increase of these three factors. In layman's terms, "great effort brings miracles."
However, there is no absolute standard for "big", but a relative concept. The number of parameters in traditional models is usually between tens of thousands and hundreds of millions, while the number of parameters in large models is at least in the hundreds of millions, and has developed to a scale of over trillions.
For example, from OpenAI's GPT-1 to GPT-3, the number of parameters has increased significantly from 110 million to 175 billion, and the unofficial estimate of GPT-4 is about 180 billion.
2. Strong generalization ability
Large models can effectively handle a variety of unseen data or new tasks. Based on the attention mechanism, by pre-training on large-scale and diverse unlabeled datasets, large models can learn and master rich general knowledge and methods, so that they can be used in a wide range of scenarios and tasks, such as text generation, natural language understanding, translation, mathematical deduction, logical reasoning, and multi-round dialogue.
Large models do not need, or only need a small amount of data samples for specific tasks, to significantly improve their performance on new tasks. For example, OpenAl has used GPT-4 to participate in a variety of human benchmark tests, and the results show that its scores in many tests have exceeded most humans (more than 80%), including law, economics, history, mathematics, reading and writing.
3. Support multi-modality
Large models can achieve efficient processing of multi-modal data. Most traditional deep learning models can only process a single data type (text, voice or image), while large models can achieve cross-modal data correlation understanding, retrieval and generation through extended encoders/decoders, cross-attention, transfer learning and other methods.
Large Multimodal Models (LMMs) can provide more comprehensive cognitive capabilities and rich interactive experiences, broaden the scope of AI applications in handling complex tasks, and become one of the important paths for the industry to explore the path towards general artificial intelligence. A typical example is the launch of OpenAI's Sora model, which has set off a new wave of development of large multimodal models around the world.
Comments