All transcription is data creation.
But not all data is equal.
Some data is messy, ambiguous, and unclear.
Some data is well formed, well structured, and unambiguous.
All data creation involves the identification of data types.
Declaring data types is what we mean when we say we are "encoding" a text.
But there are different ways of encoding a text, and each method has pros and cons.
Visual Encoding and Semantic Encoding.
While visual encoding may initially seem more intuitive and more pleasing to an editor, it has TWO fundamental problems.
For example consider the various uses of italic rendering in a critical edition.
Consider a heading is identified by its position in the center of a page surrounded by whitespace. If this rendering were lost, its distinction from the surrounding text would be lost.
This makes the dream of creating multiple representations (such as print and web displays) from a **Single Source Document** impossible.
Our aim must be to de-couple form and matter so that we can take that content and instantiate it in as many different types of matter (or media) as we wish.
The better alternative is semantic encoding.
Instead of distinguish data types by how they appear, we explicit label data types using an agreed upon field standard vocabulary
If the data is a heading then let's call it a heading.
If the data is a title let's call it a title.
Consider the enormous variation data required to comprehensively describe a critical text
For example:
Such a list only scratches the surface
This database of metadata is built automatically by "crawling" the Semantically Encoded Editions that editors produce.
This database is public and open, which means anyone can build a web application on top of it.
This means that the text, the organizing database, and web publishing plaform are modular and separable.
In short: your data is not locked in one website. It can be used by multiple applications for a plurality of purposes.
In order to create views where text and images can be immediately called upon and compared, a significant amount of abstraction and data-modeling is necessary.
We need to keep track of single digital files and their relationship to a "folio" or "page".
Further, we need to keep track of that folio's place within the "material hierarchy" (the codex) and the "content hierarchy" (the text).
Finally, we need to be able to track connections between our transcriptions of the text, folios, and by extension the digital images to which those transcriptions are related.