(Brief) History of AI and How We Can Compare Models Today Artwork

Awakened Intelligence (An R U Coding Me Podcast)

Join doctoral researcher Jacob as he catches the audience up on the last few years of AI development and breaks down the latest AI trends in an easy-to-digest format. You don't need a fancy degree or years of development experience to participate in one of the largest technological movements from this century. The goal of this podcast is to make AI accessible to as many people as possible.

All Episodes

Awakened Intelligence (An R U Coding Me Podcast)

(Brief) History of AI and How We Can Compare Models Today

March 24, 2024 • R U Coding Me LLC • Season 1 • Episode 3

0:00 | 13:55

How did you like the episode?

From theories to TPUs, we've come a very long way in Artificial Intelligence. In this episode, we'll highlight some of the notable Artificial Intelligence contributions that have led to where we're at in AI and give a quick method for identifying key components in models. Why? Because having the ability to identify models and what they're good at will help you participate in the AI conversation! And knowing where AI started from will hopefully give you a better appreciation for the technology.

Hello everyone, my name’s Jacob and this is the Awakened Intelligence podcast.

It’s easy to get lost in the AI hype train and buried under technical papers.

My goals for this podcast are to make AI easier to understand and keep you up to date on the latest developments.

Today, I wanted to explain what artificial intelligence is, where it started and how we can accurately describe models today.

Artificial Intelligence is a term used to describe a class of learning algorithms for computers. Underneath the hood of AI program is a complex function and a bunch of vector math. An AI model will read in a sequence of data, convert the data into a vector (or list) of numbers and will generate a response off of the input. This concept was introduced way back in the early to mid 1900’s, but making such a mathematical model run on a stack of punch cards is no easy-feat. In order to make a reasonable model, we needed a path to go digital.

If you’re a big WW2 history buff, you’ll know the name Alan Turing -the father of modern cryptography. He, alongside Emil Post and several other mathematicians, developed a model of computation that would later be coined as the Turing machine. These theoretical machines would have the capacity to run any program, including AI.

The concept of AI was popularized in the 1950’s through media and research, however, running simple models could cost over $200k a month for a simple application. With a price tag this large, very few people could experiment with this technology. Many historians and researchers credit the Logic Theorist program by Allen Newell, Cliff Shaw and Herbert Simon for popularizing the concept. This problem solving AI program was demonstrated at a conference in 1956 which would solidify AI as a subfield of Computer Science.

Throughout the 70’s up until 2017, AI has seen a series of ups and downs. The later half of the 1900’s and early 2000’s sparked massive improvements in computing power, but very limited algorithm improvements. But as of the last few years, we have not seen much improvements from hardware -which significantly halted AI development.

Moore’s Law: Every two years, double output at half the cost for hardware.

Bottomed out

What is a transistor - semiconductor that amplifies or switches electrical signals

Transistor sizes

We would have to wait for Quantum Computers to be good, or develop better algorithms.

In 2017, a revolutionary architecture was unveiled in one of the first papers I got to read in my undergraduate research: “Attention Is All You Need”. The transformer model, similar to a Long-Short Term Memory model, reduced the amount of computation needed to process large amounts of data. In many experiments, it outperforms previous architectures and can handle massive amounts of data. And when people talk about large-language models (aka LLMs) like ChatGPT or building niche-focused applications on top of LLMs (aka Retrieval-Augmented Generation or RAG), they are most likely talking about a Transformer model.

I can spend hours talking about the different architectures and models. In my opinion, knowing what AI programs do is far more important than remembering names as you can compare models together based on what they were built to do.

For example, comparing ChatGPT with IBM’s Deep Blue chess program from 1997 isn’t fair.

Deep Blue was built to play chess, and watching the first version of ChatGPT play chess was hysterical to say the least.

However, Deep Blue can’t do your math homework or have a conversation with you.

We can describe a model on a couple of notable features:

1.) What kind of data it works with

2.) Purpose of the model

3.) How the data is presented

For example, models that work with human languages are referred to as to Natural Language Processing or NLP models. NLP models have a wide variety of applications like generating responses (aka an NLP generator) or detecting plagiarism (an NLP classifier). Even these two models cannot be directly compared against each other.

These models may share similar code for creating features, the numbers we will feed into the mathematical model, but the mathematical models will be entirely different. The purpose of a model often defines the output layer, or the end result after we send in a sequence of data. The plagiarism detector might return a score between 0 and 1 for a percentage of plagiarism. However, the generator will need to create a few sentences of text.

Another way to distinguish models is how the data is presented to the model. The first way provides the model with a selection of labels for it to learn from. For example, we could create a dataset of labeled handwritten digits (the MNIST dataset) and assign a label that matches the digit in the image. This approach is called supervised learning. A lot of classification and prediction tasks rely on supervised data, and there are quite a few rudimentary machine learning algorithms that excel in this task.

NOTE: Rudimentary algorithms are not bad. Super small, but efficient solutions can run faster and be deployed on smaller pieces of hardware.

Sometimes labeling data is just not possible. Unsupervised learning requires different algorithms to identify patterns in the data. From my research, unsupervised models can use semi-supervised techniques to label most of the data if some of the data is already labeled. They can also use other inputs from your training data as labels if you are associating a sequence of events together.

When I was introduced to AI, I first learned about reinforcement learning through YouTube. I saw a video about beating a retro Mario game using reinforcement learning and after hours of computation, Mario could finally clear a level without the player. Reinforcement learning rewards a model on its ability to complete tasks in a sequential order. In order to make it through the level, the AI model needs to learn that Goombas will kill Mario if he comes into contact with one. However, coins will increase our score so the model should also grab as many coins as possible. Reinforcement models require positive and negative reinforcements to encourage the desired behavior. If it sounds like Pavlov conditioning, you wouldn’t be too far off.

We also have pre-trained models, like the GPT (generative pre-trained transformer), that can be molded into a variety of applications with additional training. These newer models don’t take nearly as much time to train as the bulk of training has already been done.

To recap, we briefly covered the history of AI and talked about some of the ways we can describe models. We know that not all models are made equal. When comparing models, we need to consider their purpose, the data they work with and how this data is presented to the model.

We’ll pick this discussion up next time as we go through more real-world models.

Stay classy

Jacob Galajda

Host