FlatAssembler
Registered Member
In a paper I published in Valpovački godišnjak and Regionalne studije, I measured the collision entropy of 5 languages: English, German, French, Italian and Croatian. I measured the collision entropy in both a long text and in the Aspell spell-checker word-list for that language. You can see the results in the table:
Of those five languages, English and French have far deeper orthography than others. And they also have the lowest collision entropy in both a long text and Aspell word list. If we assume the depth of the orthography has no effect on the collision entropy, the probability of that happening by chance (the p-value of my observation) is 1/((5!/(2!*(5-2)!))^2)*2=1/50.
Now, obviously, suggesting that the depth of the orthography decreases the collision entropy (or, for that sake, any entropy, including the Shannon entropy) seems absurd in the light of historical linguistics. Historical linguistics teaches us that the way a word is spelt in a language with a deep orthography corresponds to how it was pronounced at some point in history of the language. English spelling represents how English was pronounced at the time of the invention of the printing press. One of the basic principles of historical linguistics is the assumption that languages that were spoken in the past had, on average, the same statistical properties as languages spoken today. Saying that languages spoken in the past had a lower collision entropy obviously contradicts that principle.
So, what do you think?
Of those five languages, English and French have far deeper orthography than others. And they also have the lowest collision entropy in both a long text and Aspell word list. If we assume the depth of the orthography has no effect on the collision entropy, the probability of that happening by chance (the p-value of my observation) is 1/((5!/(2!*(5-2)!))^2)*2=1/50.
Now, obviously, suggesting that the depth of the orthography decreases the collision entropy (or, for that sake, any entropy, including the Shannon entropy) seems absurd in the light of historical linguistics. Historical linguistics teaches us that the way a word is spelt in a language with a deep orthography corresponds to how it was pronounced at some point in history of the language. English spelling represents how English was pronounced at the time of the invention of the printing press. One of the basic principles of historical linguistics is the assumption that languages that were spoken in the past had, on average, the same statistical properties as languages spoken today. Saying that languages spoken in the past had a lower collision entropy obviously contradicts that principle.
So, what do you think?