How Large Language Models Work
The video presents the title ‘HOW LARGE LANGUAGE MODELS WORK EXPLAINING ALGORITHMS AND AI’ over a collage of pictures. It then discusses how AI algorithms are trained, not programmed, and given a goal. A graphic of a machine with gears, pipes, and pressure gauges is shown, along with the question ‘How is AI different from simple algorithms?’. The video then gives an example of a Large Language Model, which is what the video will explain. The video states that L, L, and M stand for ‘Large, Language, Model.’
The video uses an example, by charting how “language” works. A graphic of a 2D plane displays ‘Roundness’ and ‘Redness’. The words ‘apple’, ‘fire truck’, ‘baseball’, and others are plotted on the plane.
The video states that AI models use dimensions, and they are then likely to find relationships based on the dimensions. The video gives an example of the sentence ‘Frida had a drink of chocolate…” and, based on the training data, shows it is highly probable the sentence will continue with “milk” because of the word “drink.” The video then points out that while the diagram shown earlier had just three dimensions, large language models frequently work with 96 dimensions of similarity, carry out 9000 operations each time they guess a new word, and are trained on datasets of around 500 billion words.
The video states that the AI works by making guesses based on “probability” and “similarity”. The video then illustrates a diagram of a Bigfoot silhouette as an UFO appears above it to illustrate the fact that while large language models can mimic fluent language, but are also prone to “hallucinations” if they guess incorrectly or if their training set includes things that are wrong.