A Matter of Semantics: Data, Information and Knowledge
"Just for fun, what are we talking about?"
We often hear people say dismissive things about semantics such as "That's mere semantics" or "let's not get all tied up in semantics", but semantics is actually important. Semantics is what words mean, and if we don't all agree on what words mean, then it becomes very hard to have a meaningful discussion.
Since this blog is about a whole new area, that of the norms and behavior of person-like machines, about making personified systems trustworthy, it is particularly important to define our terms and speak clearly. This posting is the first in a series of short articles that I intend to create addressing the terms I am using in my work, their meanings and relationships.
Perhaps somewhat appropriately, the subject of this first article is words having to do with meaning and understanding, the words "data", "information", "knowledge", and "wisdom".
Data, Information, Knowledge (and Wisdom)
One of the reasons that trust is becoming a bigger issue is that computer and IT systems are climbing what's often called the "information hierarchy" and are dealing with data in more meaningful ways. There are a number of models of this hierarchy in information theory, often spoken of under the rubric of "DIKW", standing for Data, Information, Knowledge, and Wisdom. In general, here is what is meant by the terms:
Data refers to the simplest elements of information: symbols, facts, numerical values, or signals depending upon the context. We often speak of the "raw data", meaning the pixels coming out of a camera or scanner; the numeric data such as temperature, air pressure, and wind direction that come from various sensors; unprocessed audio from a microphone; and the like. In and of itself, data is of minimal use or meaning.
Information is where meaning begins. Information is inferred from data, and the relationships between different pieces of data. Information is data about something. Stepping outside the realm of computers and information for a moment, journalists turn facts into informative stories by asking the "who, what, where, when" questions. This is the primary distinction between data and information. A stream of X,Y mouse positions and clicks, or of signals from a microphone, key taps on a keyboard, become menu items selected, actions performed, words spoken or written. The data begins to be meaningful.
Knowledge, on the other hand, is where the meaning comes into its own. Knowledge deals with what the information is about. We see this distinction in systems like voice recognition or text auto-correcters. Dealing just with information, we can correlate certain patterns of sounds with specific phonemes or words, we can compare sequences of letters with words in a word list, or look at word pair frequencies to determine the most likely correct word. All of that is manipulating information. However if we know what the speaker is talking about, what the topic of a document is, then we have knowledge that will greatly improve our ability to choose the right transcription of the spoken word or to determine the correct word.
Wisdom is often cited as the next step in the hierarchy, as the point where judgement comes in. As such, it is of only marginal significance with regard to the behavior of current systems as opposed to people. Wisdom is certainly a thing that we would like autonomous systems to embody. Only when they get to this level will we truly be able to talk about systems that behave and decide ethically. For now, however, they will have to leave that to us.
In the context of autonomous and personified systems, information is valuable, but knowledge is the real power. As they accumulate not mere information about us, but knowledge, the stakes are raised on our privacy. Allow me to finish with a story from the news of the last few years.
The point of sales systems at Target collect all sorts of data each time someone buys an item from one of their stores. This becomes useful information when it is correlated with stock inventories and can allow them to manage those inventories. The data also becomes information about their customers, when sales are tagged with a customer ID. By analyzing patterns in the information that they have about their customers they can gain knowledge about their customers, such as which ones are behaving as if they were pregnant. They begin to know what the information means. This allows them to target promotions to have just the right content at just the right time, to capture new customers. It also leads, as the news reported, to a father being angered because Target knew that his daughter is pregnant before he did. Today, management at Target has the wisdom to merely increase the number of references to items of interest to new mothers in the customized catalogs mailed to expectant mothers rather than send them a catalog of nothing but new baby items.
This distinction and the spectrum from data to wisdom will feature prominently in all of the work reported on in this blog, and our understanding of them will, no doubt, be shaped by the discussions, research and understandings that we develop. It is, for instance, debatable whether existing Machine Learning and Deep Learning techniques truly represent actual knowledge and understanding or are just very elaborate manipulations of information. If a system does not understand something, but merely recognizes patterns and categories, but presents it to a human who immediately understands the full meaning and implications, do we call it mere information or the beginnings of true knowledge? We will see, and I would love to hear your thoughts on the matter.