Obviously, one of the great technical innovations of our time is the World Wide Web, invented by Tim Berners-Lee. Here, he speaks about the launch of the next generation of his creation: the Semantic Web.
What is the Semantic Web? Perhaps the best way to understand the concept is to contrast it to the current Web, which is set up to help you find documents that may (or may not) have the information you want. The Semantic Web, on the other hand, would catalogue important data to indicate the type of information that data represents—places, things, people—thus enabling a new dimension of archiving and search. The Semantic Web can therefore be thought of as a "smarter," more useful resource.
The World Wide Web Consortium (W3C), founded and led by Berners-Lee, has taken the lead in developing the necessary standards to make the next generation of the Web possible. W3C has been instrumental in the development of the core infrastructure of the current Web, which has grown significantly from the original Berners-Lee specifications for URIs [Uniform Resource Identifier], HTTP and HTML.
Recently, the W3C completed work on the standards that will enable the Semantic Web to come into existence: the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Links to these and other Semantic Web Recommendations (as W3C refers to its standards) and materials can be found here: http://www.w3.org/2001/sw/
Updegrove: Before we get into specifics, what is it like bringing your vision to the world for the second time, now that the Semantic Web is beginning to take hold?
Berners-Lee: Our work in promoting rather than developing the Semantic Web technologies has been like "déjà vu all over again" for me. Fifteen years ago, one of the hardest things to do was not to develop the initial version of HTTP, or to create a browser that was also an editor, or even to get approval for the purchase of the equipment (!). The difficult thing was to convince people that the Web was something they should adopt.
At CERN (the European Laboratory for Particle Physics, in Geneva), the killer app that got us through the technical barriers (operating system, hardware, philosophy) was making the phone book available through the Web. In the outside world, beyond lab settings, what helped the Web breakthrough were two simultaneous developments—that CERN was making the code available to anyone who would like it free of charge or other encumbrance, and that young developers were coming up with browser software, including multiple implementations that supported inline images.
And so with the potential licensing barriers down and the relative ease of setting up a server, things took off. But imagine, if you can, online information systems before the Web, and what it was like to try to explain the whole idea of the Web to people.
Envisioning life in the Semantic Web is a similar proposition. Some people have said, "Why do I need the Semantic Web? I have Google!"
Google is great for helping people find things, yes! But finding things more easily is not the same thing as using the Semantic Web. It's about creating things from data you've compiled yourself, or combining it with volumes (think databases, not so much individual documents) of data from other sources to make new discoveries. It's about the ability to use and reuse vast volumes of data.
Yes, Google can claim to index billions of pages, but given the format of those diverse pages, there may not be a whole lot more the search engine tool can reliably do. We're looking at applications that enable transformations, by being able to take large amounts of data and being able to run models on the fly—whether these are financial models for oil futures, discovering the synergies between biology and chemistry researchers in the Life Sciences, or getting the best price and service on a new pair of hiking boots.
Updegrove: As you look at the Semantic Web project now, some eight years after its inception, are you encouraged or discouraged? Does it look to you today as if you will be able to accomplish less, as much, or more than you had originally envisioned?
Berners-Lee: The Semantic Web has a whole lot more to it than the original Web. Building something which will be a firm logical foundation for interoperating business systems and query systems and so on takes more work and has to be a lot more well defined than a simple jotting down of some HTML tags! However, we have the entire URI and HTTP infrastructure to build on, of course.
One can always wish things were further along, but in fact I think the progress has been great. We were asked to hold up the query and rules work because people didn't want to start on it until the ontology work had finished, so for some we were in danger of going too fast. Now we have a good solid layer of RDF and OWL, which allows systems to be described, and data to be exchanged. OWL turned out to be more powerful than I had expected, and that is great. The query language I think will be a major step, as it will allow major databases to be exposed without one having to transfer the whole file. It will also provide a way of integrating across SQL and XQuery systems.
I'm disappointed that we haven't seen RDF used as an export format on random applications such as desktop and enterprise applications. This may be because the RDF/XML syntax is a little off-putting. It is an irony that the RDF model itself is simpler than that of XML, but it isn't evident when you encode it in the standard syntax. The informal N3 syntax provides a learning- and more human-friendly on-ramp for export and import, and it may be that standardizing that would be a useful step. On the other hand, there is an ever-growing set of adapters from various formats to RDF.
I am very happy about the reception that the Semantic Web has had in specific areas where people "get it." The FOAF [Friend of a Friend] project, for example, has a great spirit, and is a quite decentralized web of information about people's business cards, CVs, and who knows who. The whole area of life sciences and healthcare has been hopping with excitement as work is done to take down the boundaries between different silos of information across the field. We had a very vibrant workshop in the area, and Semantic Web was the talk of the recent BIO-IT conference.
I think the hope for more true interactivity in terms of collaborative tools, particularly real-time collaborative tools, has yet to be realized—it's something I had hoped for in the early days, and I am still hoping to see it happen.
Updegrove: Since this is your second time around designing the Web, what did you learn from taking the Web from concept to reality the first time that may help us anticipate how the Semantic Web will become real?
Berners-Lee: The Semantic Web idea—that of having data as well as documents on the Web—has been around since the start of the Web. It is just more complicated to do.
Experience from the initial growth of the Web of documents? Well, it was a very rigid exponential growth, which couldn't be slowed or hastened. Different people "got it" in different years, and to them it's seemed that the Web had "happened" all that year. It spread first among enthusiasts, and then among small sub-communities where one could get to critical mass with the momentum of a few champions. These communities (High Energy Physics for the WWW, possibly Life Sciences for Semantic Web) are full of people who have very big challenges to tackle, and are largely scientifically minded people who understand the new paradigm. These things may be very similar.
Where it is different is that there is attention from the press. We work under floodlights. Whereas the WWW took off in the hands of the converts, and others were left in blissful ignorance, the Semantic Web takes off with articles like this one, and people checking to see whether it is time for them to get involved. This has helped in some ways, hindered in others. We have to work hard to make sure that expectations are not overstated.
I think there were important landmarks in getting the Web broadly adopted. The fact that CERN would not impose onerous licensing conditions on the use of Web technologies cannot be overstated. I knew of companies—big companies—that forbade their employees to pick up our work until CERN made its declaration for free use. The W3C patent policy now makes the development of new standards much safer in this respect, and it is an important aspect of the Semantic Web that it be royalty free.
Updegrove: People talk about a "killer app" for the Semantic Web, and you rightly point out that the Semantic Web itself is the killer app. Still, there has to be an incentive for people to encode semantically and create agents, so there seems to at least be a chicken and egg issue. Does a company like Google have to commit to semantic browsing before the Semantic Web takes off?
Berners-Lee: I think that for many companies it may be that the killer app is an intranet. Many of the early WWW servers were inside the firewalls. The valuable data is company-confidential, and it is much safer to experiment with new technology in private! One computer company had, I think, 100 web servers internally before it had a public one.
Similarly now, pharmaceutical companies are experimenting internally, but the company data isn't all shared. This slows uptake, as the results are not there to be linked to by others. Similarly, when I do my personal finances using Semantic Web tools, I can export the rule files—but not the data as an example!
Note that search engines for the traditional Web of documents have the task of finding relevant items in a sea of documents in (some form of more or less broken) natural language, with links. The Semantic Web is very different. Search techniques for the Semantic Web are going to be very different: It may be that the value add will be made in different ways by systems roaming around and looking for patterns, or by performing some specific types of inference, or by indexing Semantic Web data in new interesting ways. It probably won't be eigenvector-based link analysis, which drives the good hypertext search engines.
In a way, the search engines are making up, by special techniques, for the lack of machine-understandable semantics in the documents on the Web.
Updegrove: If the big browser companies do not come on board, what will be the value proposition that will drive semantic encoding?
Berners-Lee: The Semantic Web architecture does not involve HTML browsers as we know them. There is a new breed of generic Semantic Web browser, but they are more like unconstrained database viewing applications than hypertext browsers.
There are at least two Semantic Browser projects I know of at MIT alone.
SIMILE is a joint project conducted by the W3C, HP, MIT Libraries, and MIT CSAIL. SIMILE seeks to enhance interoperability among digital assets, schemata/vocabularies/ontologies, metadata and services. A key challenge is that the collections that must interoperate are often distributed across individual, community and institutional stores. To in part address this goal, the SIMILE team created Piggy Bank as an extension to the Firefox Web browser that turns it into a Semantic Web browser, letting you make use of existing information on the Web in more useful and flexible ways.
The Haystack Project is investigating approaches designed to let people manage their information in ways that make the most sense to them. By removing arbitrary application-created barriers, which handle only certain information "types" and relationships as defined by the developer, we aim to let users define their most effective arrangements and connections between views of information.
Such personalization of information management will dramatically improve everyone's ability to find what they need when they need it. This includes Piggy Bank as well as what they call the universal information client.
Updegrove: Who do you expect the early adopters to be, on the encoding side? Are there some there already?
Berners-Lee: Adobe is the only one I can talk about today, but there are others on the cusp of announcement.
Updegrove: What will the Semantic Web do to browsers? Will it be likely to strengthen the influence of the major browsers, or result in new entrants?
Berners-Lee: I think you'll see a bit of both here—revitalization of competition, and clear targets for functionality, but it's a bit complicated. In short, browsers will be affected by the Semantic Web in many ways.
They may be pressured to become generic Semantic Web browsers. They may use Semantic Web metadata to accompany the human-oriented media. They may use Semantic Web metadata to select and marshal human-oriented metadata. There may be a very powerful client-side programming platform developed (as in Haystack, and RDF-Ajax applications) in which the client-side script sees the world and the display medium as a mass of RDF and SPARQL.
Updegrove: When will Web users begin to enjoy the benefits of the Semantic Web?
Berners-Lee: They already have, in applications that range from social networking (FOAF), content description (Adobe Creative Suite), learning about licensing constraints of Web content (Creative Commons), as well as the widespread use of OWL in a variety of disciplines.
Note: This article originally appeared in the Technology Law Bulletin, a quarterly publication of Gesmer Updegrove LLP.