Skip to main content

A Brief Overview of GPT-3 by OpenAI

 

 

You have probably already seen some articles like "A robot wrote an entire article. Aren't you scared yet, human?"

So, who is the robot here? 

 

It's GPT-3 model. It's a transformer based language model. The full form of GPT is Generative Pre-trained Transformers. This model is developed by OpenAI.

There were GPT-2 and other models released by OpenAI previously. GPT-3 was released in May 2020. GPT-3 is more robust than its predecessors. Though architecturally it doesn't have that mush difference.  

GPT-3 can write articles, poems, and even working code for you*, given some context. There are some limitations which I am going explain later in this article.

It's a language model means given a text, it probabilistically predicts what tokens from a known vocabulary will come next in that string. So, it's sort of a autocomplete that we see on a phone keyboard. We type a word, and then the keyboard suggests another word that can come next. What sets GPT-3 apart from earlier models is really not its architecture but its size, the number of its trainable parameters.

 
image source: sigmoid.com        

As you can see in the above image, the trainable parameters of GPT-3 is around 175 billion, which is a lot more than the other similar models like GPT-2 or BERT.  

In general, the more training parameters you have in your model, the more data you need for training. GPT-3 was trained on a very large dataset. Architecturally, it isn't any different from the original transformer model except it is much larger, and a little bit different from the BERT, another well known model, developed by Google. 

BERT was designed to take in raw text, and then produce embeddings that can be used in other machine learning applications down the line. 

In comparison, gpt-1, gpt-2 and gpt-3 use the decoder half so that they take in embeddings and then they produce text. 

So, one question comes a lot in mind that does gpt-3 has some sort of intelligence? The simple answer is 'NO'. 

There is nothing in the gpt-3 training so that it can create a structured system of knowledge about the world. The task it's been trained to do is to predict the next word. So, it can create both factually correct and incorrect sentences.

The benefits of gpt-3 model is that the produced texts sound more fluent, and seems like a human might wrote the sentence because of it's grammatically correct structure. 

GPT-3 uses few-shot learning approach for learning. So to produce some specific type of text, you just need to give few examples of that type of texts' sample to the model, and gpt-3 will produce more examples of similar texts. 

Few drawbacks of this model are that it's a predominantly large English model, 93% training data was English. It's also very expensive, as I have mentioned earlier that it has 175 billion trainable parameters. The cost mentioned in OpenAI to train it was 12 million USD. Right now, it has closed API access. As the model is very large, the chances that you will be able train this model for yourself is very low.

Comments

  1. This blog provides a brief overview of GPT-3 by OpenAI. It may cover the fundamental features, capabilities, and potential applications of the GPT-3 (Generative Pre-trained Transformer 3) language model developed by OpenAI. A valuable resource for individuals interested in understanding the key aspects of GPT-3's impact on natural language processing. If you are looking forward to hire openAi Developers, we will gladly help you.

    ReplyDelete

Post a Comment

Popular posts from this blog

Difference between a Singly LinkedList and Doubly LinkedList

DFS Performance Measurement

Completeness DFS is not complete, to convince yourself consider that our search start expanding the left subtree of the root for so long path (maybe infinite) when different choice near the root could lead to a solution, now suppose that the left subtree of the root has no solution, and it is unbounded, then the search will continue going deep infinitely, in this case , we say that DFS is not complete. Optimality  Consider the scenario that there is more than one goal node, and our search decided to first expand the left subtree of the root where there is a solution at a very deep level of this left subtree , in the same time the right subtree of the root has a solution near the root, here comes the non-optimality of DFS that it is not guaranteed that the first goal to find is the optimal one, so we conclude that DFS is not optimal. Time Complexity Consider a state space that is identical to that of BFS, with branching factor b, and we start the search fro...