If you come from a statistical background or a machine learning one then probably you don’t need any reasons for why it’s useful to build language models. If not, here’s what language models are and why they are useful.
What is a model?
Generally speaking, a model (in the statistical sense of course) is a mathematical representation of a process. Almost always models are an approximation of the process. There are several reasons for this but the 2 most important are:
1. We usually only observe the process a limited amount of times
2. The model can be exceptionally complex so we simplify it
A statistician guy once said: All models are wrong, but some are useful.
Here’s what a model usually does: it describes how the modelled process creates data. In our case, the modelled phenomenon is the human language. A language model provides us with a way of generating human language. These models are usually made of probability distributions.
A model is built by observing some samples generated by the phenomenon to be modelled. In the same way, a language model is built by observing some text.
Let’s start building some models.