Old AI is just ML, ChatGPT is the real AI Artwork

Pragmatic Data Scientists

Make Data Useful

Pragmatic Data Scientists

Old AI is just ML, ChatGPT is the real AI

October 09, 2023 • Yuzheng Sun

YZ: 0:00

Today, I'm going to give you a friendly but technical discussion of why ChatGPT is so important. Hopefully, by the end of this video, you can build a first principle understanding of what generative AI is, and why it is different from past AI. Building this first principle understanding would be similar to building a similar understanding of what iPhone is. I remember when iPhone first came out, a lot of people just viewed iPhone as some phone with bigger screen produced by Apple. And there were a lot of consumer electronic articles comparing iPhone to other phones, and draw a conclusion that other phones are superior to iPhone. Now, after over a decade, there was no debate. iPhone is a new paradigm, and iPhone is important. It unlocks a lot of industries worth trillions of dollars because of that paradigm shift in innovation. I firmly believe generative AI is a paradigm shift that is bigger that is going to be bigger than mobile Internet. I'm going to share that reasoning today. feel free to reference or critique. Because I want to make this video friendly, so I'm not going to include all the technical details, but I will include them in the description. That includes my slides, my substack article, and my hour long technical presentation about ChatGPT. Feel free to check them out if you are interested. First the one minute map of all the new terms. We have AGI, which is a concept. We have generative AI, which is a family of new AI models. Under generative AI, we have large language models, LLMs, and other generative models such as stable diffusion that generates pictures. We don't talk about that. There used to be a lot of different ways of doing large language models, such as BERT, GPT. BERT is two directional, GPT is one directional, you don't need to know the difference. But now, after ChatGPT, everyone quickly converged to GPT way of doing things, such as LLAMA. ChartGPT is one"alignment application" under GPT. GPT can do a lot of different things and ChatGPT specializes in doing chat with you. In order to understand the implication of ChatGPT, we cannot limit ourselves to ChatGPT or even GPT because there are going to be a lot of other companies producing foundation models. So we need to understand what's new in this large language models. To understand what's new, we need to understand what's old. So let me just give you a three minutes overview of what past AI is and its limitations. In the past decade, when we talk about artificial intelligence, we are mostly talking about machine learning and specifically the subfield of machine learning, that is deep learning. Because deep learning benefited the most from the huge advancement in processing power, and data. The paradigm of deep learning or machine learning in general is finding correspondence in data. Or if I use an analogy, it is a"smarter if condition". If you think about what if condition does, it is if this, then that. It is finding correspondence. You give the if condition some inputs, and they produce some outputs. Machine learning just do this, but more smartly. You don't need to specify all the if conditions. You don't need to write out the case when. You don't need to write out all the scenarios. What you do is just feed the model some tagged data that is giving some inputs and their desired outputs. The model will just find the pattern for you. For example, we all know Siri. If I code Siri with the if conditions, I'll just tell Siri if you hear this word, then respond that way. If I code with machine learning models, I'll just find Siri a lot of tagged data that correspond to this is a sentence, and this is the outcome desired from that sentence, and fit the model. Siri will learn this pattern and whenever they hear something similar they will correspond to the desired outcome. If you worked in Amazon or Apple, you would actually know even to this date, there are a lot of programmers programming if conditions in Siri or Alexa, just because it's more efficient and more accurate. So that comes the limitation of these models. The models cannot produce anything new; they have to learn the existing pattern and apply the existing pattern. The models are actually overfitting machines. So if you generate a series of random number, the deep learning models will find you the pattern in that seemingly random series because the models do not care, they don't understand, they don't do the understanding in human way, they just find the correspondence, they just optimize one a mathematical function that is finds the correspondence in existing data for as much as possible. If we use the model smartly, we can give the impression that the model have learned something new, but it's not true. For example, in recommendations, a lot of us had this serendipity moment. Spotify or Netflix, they recommend some songs, movies, posts, advertisements that we didn't know we are interested but we are actually interested. It gives me the sense that the model knows better about my preference than myself. But that is just a smart design. Collaborative collaborative filtering. That is, the model actually know a lot of people's preference, they have some scores of who is similar to you by their viewing history or by connections. Then they recommend to you the song or the movie from someone who is similar to you. Chances are you are going to like it as well. So that is the paradigm, finding the correspondence in existing data. And the limitation that is the pattern has to be there. So the models, in order to increase their performance, they have to be specialized. They cannot be general. And that is a big limitation. In the past few years, growth of big tech firms globally has been slowing down, partially because these existing technologies limitations. Number three, we go to these large language models, and almost magically, they overcame this limitation. The inflection point was actually GPT 3. I call it magical because the capacity of these large language models to do something new came out of"emergence". Even today people who produce these models do not know exactly why these models can do something new. But they just know that by feeding these large language models more data, better structured data, and allow these models to have more parameters and train them well, they suddenly learned to do something new. It is also a big debate and a controversial topic in machine learning today because some machine learning scientists do not believe that large language models can do something new because we actually know how it works. They are"generative autoregressive large language models", which means the foundation model's task, is just to generate the next token. If you don't understand what token is, just imagine this is generating the next word. And they take the next word use it as input and try to generate the next one. They just try to do this task well and suddenly they are able to have conversations with you and have multiple rounds of conversations with you. The magical part is they can actually do"in- context learning" and"in- context corrections". They don't need to change the weight of the model, but they can learn from your response and correct their answers. This topic is also very deep and very technical, so I will stop here, if you use ChatGPT you can come to this conclusion by yourself. That is, the new large language models has overcame the old limitation. They can give you something new, they can understand you, at least, they appear to understand you as much as an ordinary people. For the tech industry, that is already enough, because this gives you the Moore's Law for human capital. Historically, Moore's law can affect chips. So anything that is affected by Moore's Law, the price has gone down over the past decade. But goods and services related to human capital their price has actually gone up, such as lawyers, educations, and medical services. After generative AI, we finally have a way to distribute human capital at a near zero marginal cost, just as software allows us to distribute tools at almost zero marginal cost. That is the opportunity, and that is why, shortly after the launch of ChatGPT, everyone, including startups, VCs, big techs, they all quickly came to the consensus that this is the next big thing, and everyone starts racing here. All right. So just to summarize. When we talk about the opportunity of ChatGPT, we should actually talk about the opportunity of large language models and generative AIs in general. And large language models are fundamentally different from past AIs. Although AI has been the hot topic for the past decade. The generative AIs, the large language models are something that is fundamentally different. It can enable this trillion dollar industry by giving Moore's Law to human capital and to reduce the marginal cost of distributing human capital down to almost zero. Hope you enjoy this video and maybe have a better understanding of the future. This is pragmatic data scientists. We are dedicated to make data science useful. If you like our content, please subscribe. Thank you and see you next time. Bye.