Since the birth of the age of computers, we have often imagined what it may be like to interact with machines as we do with other humans. That is, to talk to and instruct devices merely by the use of our voices (or at least words typed on a keyboard). Of course, it is but a small part of the dream of creating true AI, yet one of great present interest for obvious reasons. Speaking, listening, and reading language, while generally clear signs of intelligence, do not necessarily require a conscious being to perform them – so long as you are prepared to accept limited context.
Camera eye of HAL 9000 from 2001: A Space Odyssey
The HAL 9000 computer from Arthur C. Clarke’s landmark 2001: A Space Oddyssey film and novel is an early and notable case of a fictional machine that can interpret (and produce) human language, among other abilities. Indeed, it is a machine responsible for one of the most memorable quotes in science fiction film:
David Bowman: Open the pod bay doors, HAL.
HAL 9000: I’m sorry, Dave. I’m afraid I can’t do that.
There is however no need to start worrying about villainous machines eavesdropping on everything we say (or write) yet! A machine capable of understanding natural language need not have any higher-level reasoning capabilities, as I will attempt to show in this article. Still, the association of language ability with that of independent thought is a curious one; something I will perhaps explore in time.
On a more serious note, the reality of machines actually reading human language, in speech or text form, is very likely one of the closest to success of the major goals of artificial intelligence. The field of natural language processing is a surprisingly mature one, dating back as far as the the 1950s, yet the holy grail is still just beyond touching distance (or far beyond even, depending how you look at it). Indeed, speech recognition and text-to-speech has come a long way too, such that reasonably sophisticated and effective algorithms are commonplace on home computers today. Clearly, the processing and interpretation of language is the largest obstacle to “natural” communication with machines. More specifically, the problem lies predominantly in mapping a sentence or phrase, on the surface a string of letters and punctuation, to an abstract concept; something that contains abstract meaning, and that a computer can relate to known information.
The process of “reading”
To begin, we should really define what fundamentally constitutes reading, and the processes it involves. Let’s consider a very simple and straightforward process first, and develop the concept from there. Simply iterating over each word (or group of words) in a sentence and looking them up in a dictionary would be a naive way to consider reading. Immediately though, we see that this completely ignores the structure of the sentence – how the parts of speech fit together. So what about constructing some sort of abstract syntax tree and tagging each word (or possibly group of words) with information about the type and relevance of the word? (Let’s put aside the significant issue of the ambiguity of language for now.) Now we at least have a representation of the grammar/structure of the sentence as well as the definitions of its components, yet something is clearly still lacking. This model has no concept associated with the sentence. What does a perfect knowledge of English grammar and vocabulary tell us without the context in which it is used, be that context the entire Universe? Context is what I define as a set of (complex) mappings between a fragment of language and the underlying concepts, abstract or concrete. Proper interpretation of the grammar and structure of a given language, combined with context, is what is required for natural language understanding, in its most general (loosest) definition. Each is a difficult task in itself, and it may be that a certain degree of overlap is required for things to work.
How humans comprehend speech and text
I shall not delve into this area too deeply, for I’m not a psychologist, and certainly not even psychologists understand properly how we comprehend language. It is however important we get a feel for how the mind processes language and where/how machines might adopt a similar approach, at some level of abstraction. Quite evidently, human listening and reading are very much holistic affairs, so trying to break down the process into parts or stages won’t get us far.
For a start, we must notice that the recognition of words is done in a very approximate way. Tests have shown that we fixate on key words when we read, and that most words are often determined from a few key letters. Changing a high proportion of the letters in text, if done in a certain way, barely impedes our ability to read. Indeed, the mind seems to pick out the structure and regularities in language (and even words) and often fills in the gaps for us. Evidence suggests that we do not interpret words in a dictionary-style fashion, but rather use the context (of varying scope) as well as our familiarity with a word from previous encounters, to give meaning. In this sense, reading is prone to significant errors in humans, as I’m sure we’ve all experienced. There is no reason machines should generally be any more precise in this respect, as we will see later. Do not let the fact that machines are incredibly precise at the “mechanical” level fool you – so are our brains’ neurons!
What allows us to read quickly and effectively more than anything else seems to be the ability to recognise familiar patterns and constructs within text. In fact, I would argue that previous familiarity with words and grammatical constructs plays a crucial role in interpretation of language by evoking certain points or areas of memory. Only when this process is done in a recursive (iterated) fashion does the meaning of some text truly take on form. In a loose sense, perhaps the process of reading can be described as the triggering of a web of interconnected ideas and memories.
Without going to deep here, I believe this holistic interpretation of reading is greatly important to keep in mind for guiding an approach to machine reading. To me, the neuro-psychological specifics are to a large degree irrelevant, whereas if we try to mimic the global picture, and consider implementation secondary – a “top down” creation of sorts – we are more likely to succeed.
How humans learn to read
The continuous process of learning to speak and later to read essentially comes down to a good deal of trial and error. I refer to speaking in addition because it is invariably the first step people take to understanding/producing language. Reading follows much more naturally after it, and indeed begins as a largely verbal/aural skill – never ceases to be in fact, though it may become more subliminal. It is a curious point to note that in ancient times (at least during the Dark Ages), reading silently was a very rare talent that amazed even many educated folk.
Although the way our minds learn to read is largely a hidden affair, it is clear that there is some hardwired ability for language in our brains. When our brains rapidly develop in the early years of our lives, interaction and familiarity with the external world is done largely through feel and sight. This gradually forms a core basis of knowledge/familiarity around which further development can occur. While at first only superficial understanding of the world is gained, it crucially allows the mind to build relationships between and around what is already familiar. It is apparent that, while the mind has a latent ability for reasoning and abstract thought, we are truly shaped by our environment. We ultimately learn advanced (human) skills only through a combination of evolutionary/genetic memory and interaction with the external world.
Can a machine read as we do?
What I have described so far in terms of the process of reading (and its learning) is a complex and many-layered system, though one with a surprisingly small number of core principles.
At the end of the day, natural language and its comprehension is an inherently human endeavour. For this reason, it is belief that true NLP (or rather understanding) ultimately requires strong AI. As I have already stated however, we can achieve much without requiring a full human-level intelligence. It should be no surprise then that some level of compromise is needed to achieve effective NLP at present. While I have stressed that the process of reading is an overwhelmingly holistic affair (though there is some internal structure to the understanding of language), we can not implement AI purely so. Striking a balance between a mechanical algorithm and a level of holism that allows for the manipulation of sufficiently abstract concepts is the key. It is a key, though, well hidden.
A practical solution
A note: It would seem my dedication to writing such megalithic articles as this has worn off over recent times. (This post has been in draft stage since February!) I hope readers will permit me to publish in the current state, so that I may gradually work towards the completion of this last section.
Here are a few notes on where I next intend to explore:
- Human learning. Requires evolutionary/genetic memory to start? Based on general intelligence and experience?
- Machine learning. Can be instilled with knowledge/basic rules by programmer? Based on statistical rules and large corpus of text?
- Mechanical versus “free” (informal) reasoning?
- How much of the process should be instilled before-hand, or learnt cumulatively – what trade off?