The Discovery Project & the Semantic Web

The Discovery Project is a web portal containing digitized multi-lingual philosophical documents. It is made up of two components:

  1.    Philosource: the digital libraries
  2.    Philospace: a P2P network of updated semantic annotations of Philosource


Philosource is made up of high quality scholarly editions of philosophical texts, often primary sources of the philosopher's handwriting and notes. It contains the writings of the Pre-Socratics, the Socratics, Diogenes Laertius, Modern Philosophy Texts, Nietzsche, Wittgenstein, and a Multimedia Enciclopedia of Philosophy. The Discovery Project is a portal leading to these separate sections that themselves are consistent in web architecture/design layout, but have separate domain name web addresses "to ensure the reliability of scholarly resources". For example, the Nietzsche Source leads you to www.nietzschesource.org, or the Socratics Source leads you to www.socratics.daphnet.org, but these seperate sources are united in that they use the same web architecture/design throughout.  Thus while separate web domains delineate the separate subject treatments, these sources are 100% inter-operable beneath the level of Philosource, something that will soon be clear and that makes this project especially worth discussing.

These separate sources are themselves fantastic digital libraries. While the interface is the same throughout all subject sources, the materials range from typescripts to original manuscripts, facsimilies to video lectures, and commentaries to official annotated translations. I include a screenshot below of one of the Pre-socratic philosophers the site represents, and of whom I'll be referring to later: Heraclitus (i.e., Herakleitos):



I won't be discussing here the digital libraries themselves, other
than to say that they are very easy to browse and are very well
organized.

The most interesting part of this project is the use of semantic web technologies. While the material in Philosource
(the digitized books, texts, manuscripts, videos etc.) is distributed
throughout the respective subject pages (e.g., www.nietzschesource.org,
www.wittgensteinsource.org, etc...), there is more than meets the eye.
Underneath these libraries is a richly weaved semantic web. That is to
say, the same material across all sites making up the Discovery Project
is marked up with meaning tags that organize it according to ontological
schema that computers can read and process. For example, if a scholar
is interested in "eternal recurrence" (which is the vocabulary of
Nietzsche in reference to the world's finite elemental makeup repeating
itself, such that history eventually repeats itself), this may be tagged
in the same way a Pre-socratic philosopher such as Heraclitus might
discuss a similar concept (e.g., you cannot step in the same river
twice), or as Deleuze discusses time and repetition, but by using yet
another distinct set of ideas, metaphors and concepts of his making.


As as user of Philosource, you won't see these annotations or
commentaries, however. To participate in the Peer-to-Peer (P2P) network
that integrates these sources with common ontologies (tags that make up a
conceptual map of all the vocabularies marked by scholars), you need to
download the Philospace personal desktop application that
scholars use to tag and comment on the sources themselves.  This is a
"collaborate environment in which to browse, study, and enrich the
content published in the Philosource federation." Doing so leaves the
original material intact, but when utilized in combination with the
desktop application, adds a new layer on top of this original layer,
thereby functioning as a roadmap and meta-index. In this way, when
scholars use Philospace in conjunction with Philosource,
ideas and concepts belonging to distinct passages and expressed in
distinct ways can be assembled instantly at the press of a button.  In
this way, these tags function as a meta-index of all sources across all
philosophers contained within the Discovery Project Portal.  The map
above is meant to describe this process.

How does this work?  That seems to me to be an essential part of any
course on digitization, as only digitization makes this new technology
available. Well, as it turns out, the Discovery Project functions as an
excellent introduction into the Semantic Web and RDF. For more complete
material, check out the World Wide Web Consortium on standards (www.w3.org) and especially the section on the semantic web.

RDF stands for Resource Description Framework. In a nutshell
this framework specifies a common (i.e., uniform) meta-linguistic
framework of what is known as a URI (Uniform Resource Identifier). This
uniform language allows for data integration when ambiguities exist
between terms that may be lexically different while semantically
similar. Thus the lexical entry "author" is semantically similar to the
different lexical entry "creator", yet librarians and information
professionals may use these distinct terms to describe the same "piece"
of identifying information. Thus, a new term needs to float over (i.e.,
it is a meta concept) the two distinct lexical concepts that uniformily
identifies the resource. This extra piece of information is the new
resource descriptor (RDF), and it serves as a meta-vocabulary term that
integrates the words "creator" and "author" into once concept.  There
are web-based controlled vocabularies for this framework already in
place (e.g., Web Ontology Language [OWL], Simple Knowledge Organization
System [SKOS] or the Rule Interchange Format [RIF]), yet nothing keeps
creators from using their own.

In this same way, the Philospace
web application allows scholars now to describe similar concepts that
use different lexical entities to describe them by utilizing RDF, or a
resource description framework (RDF) that identifies common  resources (URI).

Thus, the tags scholars agree upon in using Philospace are these common resource identifiers. That is, these identifiers demarcate similar concepts, ideas and passages throughout the philosophical materials. The "eternal recurrence" of Nietzsche is identified in this way with the "never step in the same river twice" concept of Heraclitus and/or the concepts of time/repetition in Deleuze by a common ueber-concept that now indexes these lexically and metaphorically distinct ideas under a common thread. This is to say that philosophers more or less agree that these two ideas may denote a common identity.

Is this the future of the web?  It could be. Indeed,  semantic web
frameworks like RDF extend to more than just words. That is, it may be
the beginnings of web-based thinking in action. That is to say,
if all of the concepts on the web are unified and denoted by unified
resource identifiers (URIs), then search and retrieval engines now have
an extra tool to work with in their arsenal: subject-predicate-object
reasoning. Right now, retrieval engines work by lexical mapping, not
conceptual mapping (though Google does do a good job with their
thesaurus when asking users "Did you mean..."). With conceptual mapping
using RDF users can ask logical questions that may have implications.
That is to say, the web can become an knowledge engine by
assuming some things (e.g., all of these authors/creators/inventors have
written or painted or produced music about time and repetition) to
answer questions about other things (did creator X write/paint/sing
about the history of time???). Hmm. A long way off?