Responsive imageResponsive image

TEI-XML Software


While working on STEP (Scholarly Text-Editing Platform), we have been developing a few software utilities that are useful in other settings, especially within the TEI community. We have therefore decided to make them available to everyone, including TEI practitioners, but also other developers working with XML.

The left panel guides you to FIVE distinct utilities: TEI-XML Components, Explore TEI-XML Files, XML:Lang Utility, TEI-XML Array Displayer, and TEI-XML Arrays to JSON Arrays. They are distinct because they fulfill distinct missions. In the course of developing them, however, we have concluded that it was desirable to package them together under one application. TEI-XML Components is by far the most ambitious of the five, but these other four are most serviceable companions. TEI-XML Array Displayer, and TEI-XML Arrays to JSON Arrays have therefore been made accessible inside TEI-XML Components via a menu called Explore in the menu bar of the latter app.

TEI-XML Components is intended to be used by anyone with an interest in TEI. Explore TEI-XML Files will be useful to anyone wishing to explore any XML file. The XML:Lang Utility will delight anyone with an interest in encoding languages while learning about them at the same time. The last two utilities, TEI-XML Array Displayer and TEI-XML Arrays to JSON Arrays, are more technical in that they reveal some of the back-end of the principal programming strategy the software relies on: the conversion of the data extracted from hundreds of TEI Guidelines files into searchable multidimensional arrays.

TEI Database

 

The illustration above shows the search pane in TEI-XML Components. One of the many advantages of this app is how it helps users navigate and search both its entire contents AND the chapters that constitute the TEI Guidelines in one fell swoop. Results are carefully typeset and hyperlinked to internal components.

 


A New Tool to Learn, Navigate, and Teach the TEI-XML Guidelines


TEI-XML Components: A Cross-Platform Software for anyone interested in TEI

Developed by André De Tienne
Director and General Editor
Peirce Edition Project, IU School of Liberal Arts


TEI-XML Components has been developed to help users learn, understand, and navigate among all the TEI components described in the TEI Guidelines swiftly and easily. The pedagogical goal is to help break the barrier that discourages many editors from embracing the TEI concept of encoding texts for the long-term benefit of humanities research. The large XML universe is daunting for most because of its many arcane rules and perceived demands for high-level technical expertise. A particular challenge is getting a good grasp of the many mutual or hierarchical interdependencies within TEI’s XML structure. TEI-XML Components aims to help users, whether beginners or even advanced, to understand the logic of those interdependencies through a navigational system that reveals, more readily than in the online guidelines, connections between elements, attributes, values, attribute classes, datatypes, models, macros, and modules.


 

Conceived initially and primarily as a companion tool to the Peirce Project’s NEH-funded STEP (Scholarly Text-Editing Platform), this app has been turned into a cross-platform standalone software for Mac, Windows, and Linux. Designed as a professional tool for serious scholarship, it is not intended nor designed for use on IOS or Android devices. It is better viewed on screens taller than 7": text encoding is a work that requires larger screens.

The initial purpose of this software was to develop a tool capable of importing the myriad TEI data and metadata directly from the TEI-c.org repositories into STEP’s companion tools in order to make them not only TEI-compliant but more robustly TEI-conformant in all particulars. Along the way, the app grew into something even more useful thanks to the way in which it incorporated and consolidated the large variety of TEI components along with their metadata in a single place.

The TEI-c.org website provides users full access to all of those components via its table of contents and especially the appendices in that table. The TEI website’s interface does a very nice job displaying the details regarding those components once users have clicked their way to them. The difficulty is that the way information about those components is distributed across hundreds of webpages makes it hard to develop a clear picture or understanding of the logic of interrelations or cross-dependencies among those components and their classes. Ordinary TEI encoders may know how to find information about a particular XML element (or tag) and have a look at what attributes they might accept, but without a clear sense of their use rules according to what context and within the context of what attribute class, what element model, complying with imperatives or recommendations formulated to accommodate what particular scholarly needs or standards in what particular discipline of the humanities as manifested in one of the TEI’s 21 modules. They would also not be likely to find out easily what kind of values certain attributes or classes thereof accept, or how to format them according to what sort of standard syntax that ensures that an XML processor attuned to TEI XML schemas would know how to interpret and process such values when laying out an encoded text or answering X-queries based on their encoding. The TEI website provides answers to such questions but navigating it and finding one’s way across hundreds of scrollable documents takes hours of clicking, often in the wrong place for lack of intuitive logic.

TEI-XML Components makes all of this much easier. All eight classes of components are brought within a single front-end interface headed by clearly labeled tabs. Clicking these tabs (the first eight in the top row) provides instant access to all TEI components that correspond to that label. Clicking any of these components brings into view everything that the TEI Guidelines allow users to know about it. Relevant information is distributed among multiple fields that are clearly labeled. All other components that are related in one way or another to the component under review (as parent, child, sibling, class, datatype, or module) are displayed in the form of links. Users need only click such links to be instantaneously transported to the exact place that discusses it within the app.

Another service provided by TEI-XML Components is a special interface (under the + tab) that lets users create and/or check attribute values in full conformity with the standard syntax those values are expected to comply with. Such syntax is defined through datatypes (“teidata”), some of which come with regular-expression formulas that are used to validate the well-formedness of such values. The app checks automatically whether submitted values pass those algorithmic tests; if they don’t, the app explains what is wrong. It includes a sub-interface specialized in forming accurate machine-readable values for durations, dates, and times—those are especially tricky, but the app makes them a breeze.

TEI-XML Components is conceived as a companion tool to the TEI-c.org website. The latter is actually embedded within the app, both through certain links but also directly and conveniently through an internal web browser. A copyright statement clarifies the extent of TEI’s intellectual property within the application (under the last tab labeled ©). Most important is that the app is fully updatable. The TEI Guidelines get updated every six months. A set of simple commands in the app’s menubar allows users to update the app in a few minutes.

The app comes with a comprehensive user guide in the form of a PDF, which is itself viewable within the app itself via the Help menu. Every object in the app (fields, buttons, widgets), when visited by the mouse pointer, displays a helpful tooltip that briefly describes its purpose. The User Guide is abundantly illustrated. The User Guide can be downloaded by clicking this URL.

 

 

 

Explore TEI-XML Files


Explore TEI-XML Files is a second utility, accessible from within TEI-XML Components via its Explore menu. This second utility facilitates the navigation of XML files and the extraction of encoded data from them. “Facilitation” means that users do not need to master programming languages such as XQuery to query the XML. Once an XML file has been downloaded, imported, or pasted into Explore TEI-XML Files, users need only use pull-down menus to examine complete alphabetized lists of tag elements, attributes, and values found in the XML file. Users then select any of those XML components in any order to display their related contents or related encodings (as the case may be) in the interface’s bottom field. Clicking any line in that bottom field selects and displays the related encoding in the XML file in the top field.

While Explore TEI-XML Files accommodates any XML file, some of its algorithms have been optimized to handle TEI-XML files especially well, for the main impulse behind this utility comes from a desire to facilitate TEI-XML transactions, so to speak. Shortcut buttons are provided to move from a selected element, attribute, or value to their full descriptive display in TEI-XML Components.

Pages 59 to 66 of the TEI-XML Components User Guide explain and illustrate how to use Explore TEI-XML Files.

 

 

 

Explore the XML:LANG UTILITY App


The XML:Lang Utility is a third application, accessible from within TEI-XML Components via its Explore menu. This third utility may look small but provides many services. It was born from the simple desire to help XML practitioners fill in correct values for the ubiquitous xml:lang attribute—the attribute in charge of identifying the language in which any portion of a text has been written or spoken. Behind that attribute is a whole universe of global scholarship with a long history driven by the research of linguists and ethnologists. The need to identify languages correctly is paramount for research.

Identifying a language is no easy task, especially because each language tends to evolve and vary greatly across space and time. The need for standardization is global, especially when considering the duty to create sharable encodings that remain valid and trustworthy over the long haul.

The IANA (Internet Assigned Numbers Authority) has established the so-called “Language Subtags Registry”, a large database that provides a unique identifier for thousands of languages, dialects, idioms, scripts, and orthographies. That Registry is built upon ISO 639-1, ISO 639-2, and ISO 639-3. The W3C’s internationalization effort recommends the use of the IANA Registry for selecting codes for languages.

The @xml:lang attribute needs to comply with those codes. Such codes have a particular structure that takes in consideration genealogical dependencies among languages, their country or region of practice, their written rendition, and their variations. The Registry provides that type of information, frequently with additional comments and cross-references, for all registered languages and scripts.

The Registry is a work in progress and depends on agreements among linguists and ethnologists. Classifying a language, whether dead, endangered, or alive, is a complex phylogenetic matter. The whole endeavor is utterly fascinating, and it is that fascination that brought this third application to do a lot more than merely providing the correct and well-structured code for any registered language. The app also helps users discover languages and dialects through its automated connection to the rich linguistic world of Glottolog.org, and uses TEI-XML Components’ internal web browser to display more information than is available in the Registry itself.

In the XML universe, use of the attribute @xml:lang can help disambiguate words across languages. The word “pain” for instance, if encoded <w xml:lang="fr">pain</w>, is a French word that means “bread” in English but, if encoded <w xml:lang="en">pain</w>, is an English word that means “douleur” in French. The IANA registry allows users to indicate the language of a text most precisely: an Ancient Greek word (up to 1453) will call for the "grc" code, while a modern Greek word will need the "el" code—not to mention Cappadocian Greek ("cpg"), Mycenean Greek ("gmy"), and Romano-Greek ("rge"). A performance done in Greek sign language would be encoded "gss", and even more precisely, though not indispensably, "sgn-gss". The app makes all such distinctions very plain, and one pedagogical advantage is to excite curiosity within the minds of students and other learners, while increasing their historical and linguistic sensitivity.

The app comes with its own downloadable User Guide, which explains how to use the XML:Lang Utility with plenty of illustrations. That user guide is accessible directly with TEI-XML Components via a command in its Help menu, a command that displays the user guide within TEI-XML Components’s own internal PDF viewer, a viewer that comes with a menu that helps navigate each section of the guide. Following is a sample of illustrations.

 

 

 

TEI-XML Array Displayer


TEI-XML Array Displayer is a fourth utility, accessible from within TEI-XML Components via the command Display Content of TEI-XML Arrays in Tree View... in its Explore menu. This fourth utility allows users to explore the structure and contents of multiple arrays that feed TEI-XML Components with all its TEI Guidelines data. Those arrays represent the data TEI-XML Components extracts from a multitude of TEI XML and HTML files on the TEIc.org website. People interested in TEI technicalities may be curious about how those files get mined and how the extracted information is getting distributed throughout the app. A partial answer is provided by the examination of those arrays—at least those that are not so long that trying to display them in full does not push their tree-viewing to the limit.

To import arrays within this utility, users click the button List TEI-XML Arrays and enter the name of the app “TEI-XML Components” (provided by default, but this utility can import arrays from any other opened LiveCode stack). The left field will be populated instantly with the names of “custom properties” that contain arrays of data inside the app. Their names start with a lowercase c to conform to LiveCode’s syntactical recommendations (LiveCode is the IDE used to develop this app). Most of those arrays are related to TEI Modules and the elements, attributes, and values they govern.

Clicking any custom property name displays the related array within the tree view widget on the right side. The tree can be fully folded or unfolded by clicking one of the two iconic buttons above the tree. In case the array is not only very large but also multidimensional (multiple subnodes), its full unfolding will be algorithmically instantaneous, but its subsequent display may take a few seconds. A progress bar and a message alert to that onward motion.

Pages 67–68 in the TEI-XML Components User Guide explain and illustrate how to use TEI-XML Array Displayer.

TEI-XML Arrays to JSON Arrays


TEI-XML Arrays to JSON Arrays is a fifth utility, accessible from within TEI-XML Components via the command Convert TEI-XML Arrays to JSON... in its Explore menu. When converted into JSON strings, TEI-XML Components arrays become useful to other applications, whether standalone or web-hosted like STEP. Plenty of other programming languages can handle JSON strings for all sorts of purposes, including building databases or retrieving data. Furthermore, TEI-XML arrays (or LiveCode arrays from other apps) can be sent to other developers via servers only after they have been converted to JSON strings. LiveCode developers can avail themselves of these JSON strings and convert them back into LiveCode arrays by using the JSONToArray function.

To import arrays within this utility, users click the button List TEI-XML Arrays... and enter the name of the app “TEI-XML Components” (provided by default, but this utility can import arrays from any other opened LiveCode stack). The left field will be populated instantly with the names of “custom properties” that contain arrays of data inside the app. Their names start with a lowercase c to conform to LiveCode’s syntactical recommendations (LiveCode is the IDE used to develop this app). Most of those arrays are related to TEI Modules and the elements, attributes, and values they govern.

Clicking any custom property name converts the related array into a JSON string that is then displayed in the center field. This may take a few seconds, or up to several minutes when dealing with very long multidimensional arrays. The button Toggle JSON Indentation toggles on and off to allow you to switch from plain string view to prettified view and back.

The JSON string will also be saved instantaneously within four distinct files: two text files (.txt) and two JSON (.json) files. One of each pair will consist of a single string of JSON text, and the other will be the “prettified” version, that is, the indented version, which is far more readable than the single string version.

An array that takes a long time to convert will likely be too long for center field display. When that is the case, the utility will alert you that it will not display the JSON string, but it will also ensure you that not everything is lost: the inability to display does not impact the ability to create the four files.

The third field on the right side, labeled “List of JSON tokens,” provides a simpler view of the array, token by token, or item by item. That list is generated automatically. If the field is blank, that means that the JSON string was deemed to be too large to be formatted and displayed. The threshold for display is an initial prettified string of no longer than 300,000 characters. The threshold for creating a file of JSON tokens is 600,000 characters. This means that JSON strings longer than 600,000 characters will not be “tokenized” at all, either for display or for file creation.

The app creates automatically the folder that will receive all of those JSON and text files. The name of that folder is TEIJSONStrings. That folder is placed with the Documents folder of your computer.

Pages 69–71 in the TEI-XML Components User Guide explain and illustrate how to use TEI-XML Arrays to JSON Arrays

Download TEI-XML Components


There are five versions of TEI-XML Components, one for the MacOS (64 bit), two for Windows (32 and 64 bit), and two for Linux (32 and 64 bit). The Mac and Windows versions provide identical functionalities and work in the same way. The Linux versions work the same except for the app’s internal browser, which is not compatible; the app in this case launches the Linux computer’s default browser instead. All versions include the FOUR companion apps, Explore TEI-XML Files, XML:Lang Utility, TEI-XML Array Displayer, and TEI-XML Arrays to JSON Arrays. The software was developed on a Mac which gives the interface an aesthetic quality that unfortunately cannot be matched in Windows or Linux.

  The software is provided free of charge under a BY-NC-ND Creative Commons License defined within a license document that accompanies the software.

Click one of the buttons below to download the software. Version 1.3.6 released February 27, 2023 (in phase with TEI Guidelines v. 4.5.0).

Double-click the zip file. Install the application preferably inside the Documents folder, not within the Applications or Program folder.

Once the software is installed, open the folder “TEI-XML Components 1.3.6,” and within it a Read_Me_First file, a license file, and the application. RIGHT-CLICK the application’s icon (a modified version of the TEI icon) and select Open in the dropdown menu. You will likely need to give permission to the software to run on your Mac. Newer Macs come with greater security features: in case the installation gets rejected, follow the easy instruction in the Read_Me_First file to overcome the restriction. When the application starts up, a splash screen will come and go quickly, and then the application will come into view.

Double-click the zip file. Install the Windows32 or Windows64 folder preferably inside the Documents folder.

Once the software is installed, open the folder; within it there are a Read_Me_First file, a license file, and the folder “TEI-XML Components 1.3.6” that contains the application. Follow the installation instructions in the Read_Me_First file.

In Windows, if the zip file or folder appears in green letters in the directory, that's because it remains encrypted. Right-click it, choose “Properties” at the bottom of the pop-up dialog, click the “Advanced...” button under the “General” tab, uncheck the checkbox “Encrypt contents to secure data,” then click OK. This will unencrypt the file, and the filename will turn black. You will need to give the software permission to run on your computer.

Double-click the zip file. Install the Linux32 or Linux64 folder preferably inside the Documents folder.

Once the software is installed, open the folder; within it there are a Read_Me_First file, a license file, and the folder “TEI-XML Components 1.3.6” that contains the application. Follow the installation instructions in the Read_Me_First file.

When launching the Linux version of the software’s .exe file, permission may be required. Execute it via the command chmod +x TEI-XML Components .

For all other details, please read the User Guide, accessible through the Help menu in TEI-XML Components, or viewable at this URL.


 

Responsive image
       
Give Now