It's not hard to say that the possibilities for expression are infinite. But how and why are they infinite? One can imagine some procedure for constructing an infinite sentence: myriad subordinate clauses, labyrinthine conjunctions, burbling streams of adjectives… Of course, such a sentence would be quite literally incomprehensible for finite beings, but if you charged a highly skilled writer with the task of crafting a sentence of n words for some fairly high value of n, she could probably deliver within the basic rules of grammar. (Sometimes I thought David Bentley Hart was taking up such a challenge in certain passages of The Beauty of the Infinite.)
What are the finite materials out of which our utterances are constructed?
For English speakers, there are only twenty-six letters. From these letters, we can create a staggering number of words. Let the set of all these words be called W. (In set theory, the notation for “the size of W” is |W|.) If we want to cast as wide a net as we can for words that can be used, we can use the OED's “over half a million words” to let |W| ≥ 500,000. To exclude some of the most archaic words, we could take the Random House Webster's Unabridged Dictionary count and let |W| ≥ 315,000. You can choose your preferred method for constructing W, but we'll agree that it's large.
Now think of all the possible word orderings that can be created out of W. We will call these “sentences,” even though they could be total gibberish, profoundly insightful, or somewhere in between. It's much more likely that they will be gibberish than not, but the important thing is that every non-gibberish word ordering is contained somewhere in the list. To be sure, there's no infallible way to tell the difference between gibberish and non-gibberish, but let's put that to one side for now.
How many “sentences” of, say, three words are there? If |W| = k, where k is some positive integer, there are k choices for each word, so there are k³ possible “sentences.” The total number of “sentences” up to a certain number (say, n) words is the sum of all the number of “sentences” with lengths less than or equal to n, which is a geometric series.
So we know how we could construct an infinite list of all possible word orderings. Even if we can't specify precisely which sentences are non-gibberish, we know that every word ordering we could make will have some position on this list. So the set of sensible sentences, whatever it may be, is countably infinite.
This is, of course, very near to the problem of Borges' “The Library of Babel.” Since we are finite, we only have time to say so many words. Is everything we say somewhere on the list, so that there is no real possibility of originality?
The answer has to be “no.”
If utterances include all the inflections we use to make our communication meaningful, each word ordering really represents a grouping of possible utterances: what some of use might be inclined to call an “equivalence relation.” Can you place an integer value on emphasis? On context? If “[It is Monday again]” represents all the things I can mean by this sequence of four words, there is simply no way to assign discrete values to the members of the set. So while the set of possible word orderings (given a static number of words) is countably infinite, inflection makes the set of utterances uncountably infinite.
As you can probably tell, I've never so much as looked at a book on linguistics. This is just the working out of an idle thought that popped into my head yesterday. So while the whole experiment is absurd, there are at least a few interesting questions that come up. Primarily: would the distribution of “sensible sentences” be anything like the distribution of prime numbers, especially as sentence length grows? Is there an analogy here?