Overview
What is this?
This is a collection of notes on formal language theory, mathematical logic, model theory, automata theory, and formal learning theory as they can be applied to linguistics. They are for graduate students of linguistics and others interested in getting their hands dirty with this kind of work.
My immediate plan for this is to write up my lecture notes from our local Math Ling group at the Rutgers University Department of Linguistics. I would also love it if others would be willing to contribute chapters on areas of their expertise. The idea for this is to be an easy-to-access collection of knowledge related to formal-language-theoretic linguistics.
Please let me know if you have any thoughts, suggestions, or something to contribute at adam.jardine@rutgers.edu.
What lies ahead is a work in progress. Proceed at your own risk.
Technical details
This was written using mdbook (hats off to Nick Danis for introducing it to me) and is hosted on GitHub.
© 2022 Adam Jardine

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
First-order Logic and the Star-Free Languages
[DRAFT]
Introduction
Last time we learned about the difference between computable and uncomputable languages. This time, we're going to show how we can compute languages using first-order logic (FO).
Let's start with an actual natural language pattern. In Sarcee [cook78], [+anterior] sibilants can follow [-anterior] sibilants, but the reverse results in ungrammaticality. (Data from [cook78], cited in [heinz10]. Sibilants are highlighted in bold.)
| ʃítʃídzàʔ | 'my duck' |
| nāʃɣátʃ | 'I killed them again' |
| *sítʃídzàʔ |
In the above example, the [-anterior] [z] follows the [+anterior] [ʃ], but it is impossible for the [+anterior] [ʃ] to follow [s].
Let's study this abstractly as a formal language. Let \(L\) be the formal language over \(\Sigma=\{a,b,c\}\) of all strings such that no \(a\) ever precedes a \(b\), anywhere in the string. I.e.,
\[ L = \{accca,cccc,bcccb,bbccca,...\}\]
and
\[ \overline{L} = \{ab,acccb, ccaccccbc, aacccccccccb, .... \} \]
Thus, \(L\) above distills the precedence-based constraint in Sarcee, replacing [-anterior] consonants with \(a\) and [+anterior] consonants with \(b\) (and everything else with \(c\)).
We can describe \(L\) with the below FO statement that forbids a \(b\) from following an \(a\).
\[(\forall x)[a(x)\rightarrow\neg(\exists y)[b(y)\land x<y]]\]
For now, we'll forgo the formal details and interpret the statement informally. The variables \(x\) and \(y\) here range over positions (or indices) in a string. Thus, \(a(x)\) is interpreted to mean 'position \(x\) is occupied by the symbol \(a\).'
The entire statement, then, reads as follows:
For every position \(x\), if \(x\) is an \(a\), then there is no \(y\) such that \(y\) is a \(b\) and \(x\) precedes \(y\).
This logical statement thus specifies a constraint on strings: for every \(a\) in a string, it cannot precede a \(b\). As shown in detail below, this statement is either true or false for every string in \(\Sigma^*\), and there is a procedure for computing this. Thus, this logical statement specifies exactly the strings in \(L\): the strings in \(L\) are those for which the statement is true, and the strings not in \(L\) are those for which it is false.
The advantages of using logical statements as grammars are numerous. Among them are the following.
-
We can draw from a rich literature connecting logic and computation [buchi60],[mcnaughtonpapert71],[immerman80],[rogerspullum11].
-
We can study patterns over structures besides strings.
-
They allow us to directly connect formal language classes to classes of functions, through the technique of logical interpretations [engelfriethoogeboom01],[filiot15].
There are more, which we will learn about more later. For now, we will use FO as a primer for using logical statements as grammars for languages, and discuss how FO is connected to the star-free (SF) class of formal languages [mcnaughtonpapert71].
Before we define our logic, we have to define strings in a bit more detail. That is, it is necessary to encode all the information that is necessary to identify a particular string. To do this we use model theory, the mathematical structure of language.
First-order Logic and the Star-Free Languages
[DRAFT]
String models
Before we define our logic, we have to define strings in a bit more detail. That is, it is necessary to encode all the information that is necessary to identify a particular string. To do this we use model theory, the mathematical structure of language.
Continuing to fix our alphabet to \(\Sigma=\{a,b,c\}\), what information do we need to identify a particular string in \(\Sigma^*\) from any other? The model will be a representation of this information.
First, think about the string \(aaaa\) as opposed to \(aaaaaa\). What distinguishes the former from the latter, clearly, is that there are four symbols in \(aaaa\) whereas there are six in \(aaaaaa\). So we need to distinguish the number of elements in a string.
We do that by establishing a domain (also referred to as a universe) (D) of elements in the string. This is a finite set of objects that represent the elements in our structure. By convention, we will use an initial sequence of natural numbers, i.e. \[D=\{1,2,3,4,...,n\}\] for some finite \(n\). Thus, for example, the domain of the model for \(aaaa\) will be \(\{1,2,3,4\}\) and the domain for the model for \(aaaaaa\) will be \(\{1,2,3,4,5,6\}\).
Note: While we use numbers, there is no inherent order on these objects. We could instead be using symbols like \(\{\square,\triangle,\bigcirc,\diamondsuit\}\). But numbers are just easier.
Of course, \(\Sigma^*\) does not only contain strings of all \(a\)s. The string \(aaaa\) must also be distinguished from, say \(abaa\). So we have to specify the symbols that are associated with elements in the string. We do this using relations on the domain in the model. A relation \(R\) is a unary relation (or property) or simply a subset of the domain: \[R\subseteq D\]
So we can add a unary relation \(R_b\) that indicates which elements are (b)s. Thus the model for \(aaaa\) would be \[M(aaaa)=\langle D=\{1,2,3,4\}; R_b=\{\}\rangle\] and the model for \(abaa\) would be \[M(abaa)=\langle D=\{1,2,3,4\}; R_b=\{2\}\rangle.\]
We are not yet done though. Consider \(abaa\) and \(aaba\). These are distinct strings, but their models will be equivalent: \[M(abaa)=\langle D=\{1,2,3,4\}; R_b=\{2\}\rangle\] \[M(aaba)=\langle D=\{1,2,3,4\}; R_b=\{3\}\rangle\] Note that the element in \(R_b\) is different in each of these models, given our numbering scheme, but these elements do not have any inherent order. So \(\langle D=\{1,2,3,4\}; R_b=\{3\}\rangle\) is isomorphic to \(\langle D=\{1,2,3,4\}; R_b=\{2\}\rangle\). That is, given the information available in the model, there is no way to tell the two apart.
So, we add a binary relation \(R_<\) that orders the elements in the model. A binary relation \(R\) consists of pairs of elements in the domain; i.e. \[R\subseteq D\times D\]
For example, for \(abaa\) we have \[M(abaa)=\langle D=\{1,2,3,4\}; R_b=\{2\}, R_<=\{(1,2),(2,3),(1,3),(1,4),(2,4),(3,4)\}\rangle\] and for \(aaba\) we have \[M(aaba)=\langle D=\{1,2,3,4\}; R_b=\{3\}, R_<=\{(1,2),(2,3),(1,3),(1,4),(2,4),(3,4)\}\rangle.\]
Now these are different (i.e., non-equivalent) models! In \(M(abaa)\), the element in \(R_b\) is only preceded by one other element; in \(M(aaba)\), the element in \(R_b\) is preceded by two. So now we have a way to distinguish strings in \(a\)s and \(b\)s, with one relation marking the \(b\)s and an ordering relation on the elements of the domain.
To also capture \(c\)s, we need to add another relation, \(R_c\). In fact, for any string signature, it's most straightforward to have a unary relation for each symbol in the alphabet. Thus, in this particular case, for any string model we'll need the unary relations \(R_a,R_b,R_c\) and the binary relation \(R_<\).
This set of relations \(R_1,R_2...,R_n\) is called a signature. We'll use calligraphic capital letters, e.g. \(\mathcal{S}\), to indicate signatures.
The signature we will use to represent strings in an arbitrary finite alphabet \(\Sigma=\{\sigma_1,\sigma_2,...,\sigma_n\}\) is as follows.
\[\mathcal{S}^<=\{R_{\sigma_1},R_{\sigma_2},...,R_{\sigma_n},R_<\}\]
To recap, a (finite) model is a mathematical object that lets us describe other objects. Formally, it is a tuple consisting of a set of elements or domain (or universe) \(D\) and a series of relations \(R_1,R_2,..., R_n\) on the domain.
\[\langle D; R_1, R_2, ..., R_n\rangle\]
The name of a particular set of relations that we use to model some set of structures is a signature.
First-order Logic and the Star-Free Languages
[DRAFT]
The syntax and semantics of first order logic
Given a signature, we can define a FO logic over that signature. To build such a logic and define how it can be interpreted over models in the signature, we need to define the syntax and semantics of FO. Defining this precisely tells us not only the range of grammars we can write, but also how these grammars compute languages.
Syntax
The syntax defines the range of statements we can write.
First, statements in FO choose from a set of variables \(\mathcal{X}=\{x,y,z,...,x_1,x_2,...\}\).
Additionally, for each relation in the signature, we have a set of atomic formulas that correspond to the relations in our model (plus equality).
Definition. (Atomic formulas)
- For any two variables \(x\) and \(y\), \(x=y\) is an atomic formula.
- For any two variables \(x\) and \(y\), \(x<y\) is an atomic formula.
- For any variable \(x\) and any unary relation \(R_\sigma\in\mathcal{S}\), \(\sigma(x)\) is an atomic formula.
Atomic formulas, as their name suggests, define the basic properties in the model to which the logic refers. Given our running example for \(\Sigma={a,b,c}\), some atomic formulas are \(a(x)\), \(b(z)\), \(x=y\), \(y<z\), etc.
The set of well-formed formulas (WFFs) in FO is then built out of atomic formulas connected by the logical connectives \(\neg\) and \(\lor\) and the quantifier \(\forall\).
Definition. (Well-formed formulas)
- If \(\varphi\) is an atomic formula, then \(\varphi\) is a WFF
- If \(\varphi\) is a WFF, then \((\neg\varphi)\) is a WFF
- If \(\varphi\) and \(\psi\) are WFFs, then \((\varphi\lor\psi)\) are WFFs
- If \(\varphi\) is a WFF for which \(x\) is a free variable, then \((\forall x)[\varphi]\) is a WFF
A free variable in an formula \(\varphi\) is a variable \(x\) such that \(...x...\) appears in \(\varphi\) without being preceded by \((\forall x)\) or \((\exists x)\) (i.e., it's not bound by a quantifier).
Note that this definition is recursive: the definition of WFF itself refers to WFFs. Thus, we can write an infinite number of WFFs by recursively applying the rules above. For example, \((\neg a(x))\) is built by starting with the atomic formula \(a(x)\) and plugging that in as \(\phi\) for rule (2.) in the definition of WFFs.
Take a moment to convince yourself of how this works by deriving the WFFs \(((\neg a(x))\lor b(x))\) and \((\forall x)[(b(x)\lor(\neg(c(x))))\).
Some notational sugar. The parentheses in the above definition make the composition of a WFF explicit, but they are often removed for readability when there is no danger of ambiguity. So, for example, \(((\neg a(x))\lor b(x))\) can be rewritten as \(\neg a(x)\lor b(x)\).
Finally, the other common logical connectives \(\rightarrow\) and \(\land\) and the quantifier \(\exists\) can be derived from the syntax we've already defined. Namely,
- \(\varphi\rightarrow\psi := \neg\varphi\lor\psi\)
- \(\varphi\land\psi := \neg(\neg\varphi\lor\neg\psi)\)
- \((\exists x)[\varphi] := \neg(\forall x)[\neg\varphi]\)
...where the notation '\(:=\)' means something like 'can be re-written as'.
Take a minute apply the above to construct our original example of \((\forall x)[a(x)\rightarrow\neg(\exists y)[b(y)\land x<y]]\).
Semantics
So far our FO WFFs are just strings of meaningless symbols. We have to define the semantics of our WFFs; that is, when a WFF is true in a model. We will do this recursively, following closely the definition above of the syntax of FO logic. In this way, for every sentence we can construct using our syntax, our semantics guarantees an interpretation for it.
Because this is a first-order logic, the variables \(x,y,z,\) etc. range over elements of the domain (i.e., positions in the string). Satisfaction is then dependent on how variables are assigned to positions into the string. As such, our notation for satisfaction must take into account this dependency.
Thus, satisfaction will be defined specifying the model \(M\) and some series \(i_1,i_2,...,i_k\) of \(k\) elements in \(D\). Accordingly, below \(\varphi(x_1,x_2,...,x_k)\) refers to a WFF with exactly \(k\) free variables \(x_1,x_2,...,x_k\). Essentially, we are going to plug each \(i_j\) into each variable \(x_j\).
Definition. (Satisfaction of well-formed formulas)
- a. \(M,i\models \sigma(x)\) iff \(i\in R_\sigma\)
b. \(M,i,j\models (x<y)\) iff \((i,j)\in R_<\)
c. \(M,i,j\models (x=y)\) iff \(i=j\) (that is, \(i\) and \(j\) are the same element in \(D\).) - \(M,i_1,i_2...,i_k\models \neg\varphi(x_1,x_2...,x_k)\) iff \(M,i_1,i_2,...,i_k\not\models \varphi(x_1,x_2...,x_k)\) (that is, \(M,i_1,i_2,...,i_k\) does not satisfy \(\varphi(x_1,x_2,...,x_k)\)).
- \(M,i_1,i_2,...,i_k,j_1,j_2,...,j_\ell\models \varphi(x_1,x_2,...,x_k)\land\psi(y_1,y_2,...,y_\ell)\) iff \(M,i_1,...,i_k\models \varphi(x_1,x_2,...,x_k)\) and \(M,j_1,j_2,...,j_k\models \psi(y_1,y_2,...,y_\ell).\)
- \(M,i_1,...,i_k\models (\forall x)[\varphi(x,y_1,y_2,...,y_k)]\) iff for all \(i\in D\), \(M,i,i_1,...,i_k\models\varphi(x,y_1,...,y_k).\)
This is a lot, so let us go through each. To illustrate, we'll use \(M=M(abac)\), depicted graphically below.
\[ \begin{array}{c} \begin{array}{|c|c|c|c|}\hline a & b & a & c \\ \hline \end{array}\\ \begin{array}{cccc} 1 & 2 & 3 & 4 \\ \end{array} \end{array} \]
First, notice that each part of the definition matches with the definition of the syntax of well-formed formulas above. Thus, for example, part (1) of the definition of the semantics defines the semantics for atomic formulas (i.e., from part (1) of the definition of the syntax).
So if our WFF is simply \(a(x)\), then \(M,1\models a(x)\) because \(1\in R_a\) in \(M\); conversely, \(M,2\not\models a(x)\) because \(2\not\in R_a\) in \(M\). Similarly, \(M,1,2,3\models(\neg b(x))\land (y<z)\) because \(M,1\models\neg b(x)\) (because \(1\not\in R_b\) and thus \(M,1\not\models b(x)\)) and \(M,2,3\models y<z\) (because \((2,3)\in R_<\)).
Part (4) of the definition, with the quantifier, is a bit different. Say we have a WFF \((\forall x)[a(y)\land y<x]\). Satisfaction here only requires specifying one element in the domain, specifically that which will correspond to \(y\), as \(x\) is bound by a quantifier. So let us check whether \(M,1\models(\forall x)[a(y)\land \neg( y<x )]\).
By the definition, \(M,1\models(\forall x)[a(y)\land \neg (x<y)]\) only in the case that for all \(i\in D\), \(M,i,1\models a(y)\land \neg x<y]\). In other words, no matter what element we assign \(x\) to, satisfaction goes through. One sure way to calculate this is via brute force, i.e. checking each element in \(D\) to make sure satisfaction holds in every case.1 So first we set \(i=1\). It holds that \(M,1,1\models a(y)\land \neg (x<y)]\): \(1\in P_a\) and it is not the case that \((1,1)\in P_<\). Next we set \(i=2\). It then holds that \(M,2,1\models a(y)\land \neg(x<y)]\): \(1\in P_a\) and it is not the case that \((2,1)\in P_<\). As is perhaps clear at this point, we really only have to check whether \(M,i,1\models\neg(x<y)\), as we always know that \(M,1\models a(y)\), the first part of the conjunct. Knowing this, it is clear that when \(i=3\) and \(i=4\), it is also the case that \(M,i,1\models\neg(x<y )\), as neither \((3,1)\) or \((4,1)\) are in \(P_<\).
As we have now checked each of (1,2,3,4) in \(D\), we are finished: for all \(i\in D\), \(M,i,1\models a(y)\land \neg(x<y)]\), and so \(M,1\models(\forall x)[a(y)\land \neg (x<y)]\).
Sentences
An important special case is when a WFF has no free variables---that is, for every variable, there is a quantifier that binds it. Such a WFF is called a sentence. This case is important because we can define satisfaction independent of any assignment of variables to elements in \(D\). That is, we can talk about whether a model satisfies a sentence directly.
Definition. (Satisfaction of a sentence). For a sentence \(\varphi=(\forall x)[\psi(x)]\), \(M\models\varphi\) iff for all \(i\in D,M,i\models\psi(x)\).
As an example, consider the WFF \[\varphi=(\forall x)\big[(\forall y)[\neg(c(x)\land a(y)\land x<y)]\big]\]
First, \(\varphi\) is a sentence because both of its variables \(x\) and \(y\) are bound by a quantifier. To see how a model can be evaluated for satisfaction of \(\varphi\), take \(M=M(abac)\) from above. For this \(M\), \(M\models\varphi\). To see why, take any two elements \(i,j\in D\) from the domain of \(M\) and substitute \(i\) for \(x\) and \(j\) for \(y\). If you evaluate the formula inside the quantifiers (i.e., \(\neg(c(x)\land a(y)\land x<y)\)), it will return true. (I leave it to the reader to work out the details themselves.)
To illustrate a model that does not satisfy \(\varphi\), consider \(M=M(abca)\).
\[ \begin{array}{c} \begin{array}{|c|c|c|c|}\hline a & b & c & a \\ \hline \end{array}\\ \begin{array}{cccc} 1 & 2 & 3 & 4 \\ \end{array} \end{array} \]
Try evaluating each pair \(i,j\in D\) against \(\varphi\). One pair will return false for the internal formula. Which is it?2
Summary
We have now learned a syntax that generates an infinite number of FO WFFs, and a semantics that can interpret any one of these WFFs. In particular, given a sentence with no free variables, we can directly check whether a model satisfies that sentence. Next, we will show how this allows a sentence to serve the function of a grammar that describes a formal language.
Exercises
- Convince yourself that \((\varphi\lor\psi)\equiv\neg(\neg\varphi\land\neg\psi)\); that is, that the former is logically equivalent to the latter. To do this, use the definition of the semantics to show that the former is true exactly in the cases that the latter is true. (If you have done a proof before, you can prove this via induction on the two definitions.)
- Convince yourself that \((\exists x)[\varphi(x)]\equiv\neg(\forall x)[\neg\varphi(x)]\).
- What are some more strings whose models satisfy the sentence \(\varphi\) above? Some that do not?
1 There can be more efficient ways to do this, if we know the formula ahead of time. But this exact procedure is guaranteed to work for any formula and any model.
2 It fails for \(i=3\) and \(j=4\), as \(i\) is a \(c\) and \(j\) is an \(a\) and \((i,j)\in R_<\).
First-order Logic and the Star-Free Languages
[DRAFT]
FO sentences as grammars
Recall the following FO sentence from the introduction:
\[\varphi=(\forall x)[a(x)\rightarrow\neg(\exists y)[b(y)\land x<y]]\]
...and the set of strings \(L\) over \(\Sigma=\{a,b,c\}\) such that no \(a\) ever precedes a \(b\), anywhere in the string:
\[ L = \{accca,cccc,bcccb,bbccca,...\}.\]
Take a moment to convince yourself that \(L\) is exactly the set of strings that satisfies \(\varphi\). To do this, show yourself that (the models of) a few strings in \(L\) satisfy \(\varphi\), and also that (the models of) a few strings in \(\overline{L}\) do not satisfy \(\varphi\).
Let's look at some more.
- Write an FO sentence that describes the set of strings that do not end in a \(b\).
- Write an FO sentence that describes the set of strings in which \(a\) cannot be followed by another \(a\), unless a \(b\) intervenes. That is, \(caccccac\) is not in the language but \(caccbccca\) is (think the obligatory contour principle-type constraint with blocking).
- What about an FO sentence that describes \((ab)^n\)? Hint: we can define a successor relation \(\triangleleft\) from \(<\) using the following equation \[x\triangleleft y:= x<y\land\neg(\exists z)[x<z\land z<y]\]
- What about an FO sentence that describes \(a^nb^n\)?