If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Setting realistic expectations

Quick overview of the things large language models do well, and the things they don't do so well—yet.

What should I expect?

Large language models like OpenAI's ChatGPT are computer programs that have been trained on huge data sets to extract meaning from text and produce language. These models can do some things very well, but they also have some limitations!
You probably wouldn't try to use a screwdriver to hammer a nail, right? When we use LLM-based chatbot tools for tasks they were designed for, we get much better results!

What do LLMs do well?

Good: Language processor

Large language models are great at extracting meaning from language. They don't "understand" text in a human sense, but they can ingest it and make sense of it, even if it's written in a way that's not perfectly clear. They've been trained on so much data that they've learned to recognize patterns and, from those patterns, the meaning of words in context.
Simplifying and summarizing long, complex text is one of AI’s superpowers. It is good at extracting key takeaways, but it is always a good idea to double check if it missed critical points!

Good: Text generator

Large language models are also good at generating text. They can take a prompt and write a paragraph or even an entire article that sounds like it may have been written by a human—a human with really good grammar skills!

Good: Brainstorming partner

Teachers are already using LLMs to help them come up with ideas for their classrooms! Given a clear request and a couple of examples, LLMs can generate multiple variations on ideas, like possible class activities, interesting thesis statements, or drafts of quiz questions.

What are LLMs not as good at?

Not so good: They make things up!

While large language models can process language well on common topics, they sometimes give wrong information, which often looks like they are making things up. People in the AI and LLM business call these “hallucinations.” This can happen for a number of reasons:
  • Faulty training data: The huge datasets that LLMs are trained on can contain millions of words, and they often come from a variety of sources, including articles, books, the Internet, and even social media posts. If the training data contain inaccuracies, the model will inherit those mistakes. If the training data are messy or inconsistent, the AI can infer patterns that don't actually exist, or misinterpret information.
  • Old training data: It can take a long time to assemble data on which to train a model, and it takes time to actually do the training. LLMs can't just be "updated" with "whatever is new on the internet." So, the model won't know anything about things that have occurred in the recent past—sometimes up to two years in the past. When an LLM's training data don't give it a basis for a fact-based response, the LLM will hallucinate. Some search engines are working on this by connecting models to the internet, but you shouldn't assume that every model you interact with has this capability.

Not so good at math!

Large language models do not, on their own, make calculations. When LLMs are asked to generate math, they generate it the same way they generate text: probabilistically. Because of this, they can sometimes make mistakes when working with simple arithmetic or more advanced mathematical concepts.
Mistakes can also happen when the model is asked to generate text that includes numbers or calculations. If the training data contains incorrect calculations, the model may replicate these errors. For example, the model might say that 3(4+2)=14.

Not so good: Fake websites and other “hallucinations”

As mentioned above, if an LLM doesn’t have the data it needs to generate a correct response, it may make up a convincing one—these “hallucinations” can happen frequently:
  • Fake websites: It may refer you to a URL, but the webpage doesn’t actually exist.
  • Wrong websites: It might provide a link to a site that is completely unrelated to the topic.
  • False citations: It might provide as a source a work that never existed, or claim that two real authors who have never collaborated are co-authors of a study—or invent fake names for authors, or fake titles for articles, research studies, or books!

Not so good: Doesn't have deep understanding of specialized concepts

While large language models can process language well on common topics, they're not always as effective when discussing the details of highly specialized concepts. For example, they might struggle to accurately identify and explain the nuances of a complex medical procedure. When pushed, it will start to make things up (see below).

Not so good: Doesn't have your context

This may sound obvious, but the models don't have all the information about you and your environment. If you are a student or teacher in a school, the model doesn't know about the sequence of lessons this week, who is having a bad day, or that you never really understood that one idea in science. So, it may suggest ideas or generate writing that does not make sense for you or your class.

Summary: Don’t trust! Verify!

LLMs are specialized productivity tools: Learn how to use them to help you be more productive. Don't ask them for answers to things they can't know, or they will make things up. Collaborate with them, but don't depend on them to create.
Forewarned is forearmed: The only way to guard against the hallucinations of an LLM is to make a habit of fact-checking everything it tells you.
The takeaway: Overall, large language models are good at understanding language, generating text, and answering questions. However, they struggle with understanding complex concepts and reasoning. They also don’t have “judgment”.
As these models continue to develop, though, they may improve in these areas!

Want to join the conversation?