Appraising and Cataloguing the Collection
The Takazawa Collection is technically a manuscript collection, whose contents were collected by the donor, a recognized Japanese expert on the subject. The donor received some materials from other persons, so he obtained their permission to include them in his donation to the University of Hawaii and selected the materials that were worthy of preservation.
When the shipment arrived in Honolulu, Takazawa performed a careful survey of the entire collection in order to identify materials whose public release or indiscriminate use might violate the privacy of individuals. Some of these items are personal papers, to which the original donors wished to restrict access for a certain number of years. Others contained the names of persons whose privacy might be violated if they were disclosed inappropriately. During his initial visit, Takazawa and Steinhoff went through the outline bibliography item by item, and Steinhoff prepared a restriction form for all items that Takazawa felt contained sensitive material. These were primarily trial records, manuscripts and notebooks, and personal correspondence. In consultation with University of Hawaii archivist James Cartwright, we agreed to handle these materials according to the following procedures.
- A set of user policies and a use application and agreement form were drawn up, which must be read and signed by all persons who are given access to materials in the collection.
- Four boxes of personal papers were sealed for designated periods of time, according to standard U.S. archival practice. Some of these were opened in 2000, and others remain sealed until later dates.
- Another subset of materials was designated for restricted use until the year 2025, because they contain the names of private persons who might be harmed by their indiscriminate release. The restrictions permit legitimate scholarly use of the materials, but prohibit copying them or using personal names in publications based on them. They have been placed in specially marked archival boxes so that they can be easily identified, and the restrictions are noted in the database, website, and bibliographies.
Since the individual items in a special manuscript collection would not be included in the main University of Hawaii catalog, we planned to catalog the Takazawa Collection into a relational database on a standalone personal computer and to include Japanese characters, romanized Japanese, and English. This would provide maximum flexibility to format the materials for print publication or other uses. The basic principle we followed in developing the database for the Takazawa Collection was that the records should contain every piece of information normally contained in a standard U.S. library cataloguing MARC record, entered in an equivalent format based on the same cataloguing rules. If at any time in the future the University of Hawaii should want to convert all or part of the Takazawa Collection database into the format of its main collection, this should be a relatively straightforward programming task of re-formatting equivalent data fields with MARC codes and adding standard subject codes. Steinhoff is not a trained librarian, so to achieve this compatibility she consulted the University’s Japanese cataloguer, Dr. Hisami Springer, while developing the cataloguing system for the collection.
Even before the actual cataloguing components of the database were developed, systems were devised to facilitate the tracking of the materials, minimize error in the Japanese data entry, and provide contextual information to assist researchers. These systems utilize the basic concepts of relational database design to minimize redundant data entry and capture each element of information in the most appropriate form and location. To keep track of the materials as they were catalogued, both the basic contents of the numbered outline bibliography and the list of shipping boxes that was keyed to the outline bibliography were entered as linked tables in the database. These tables were updated as boxes were sorted and emptied, or as the material in files was catalogued into the database and placed in archival boxes.
To minimize errors and discrepancies in the entry of Japanese names and terms, a master dictionary was created into which all Japanese names of authors, editors, translators and publishers were entered once, in Japanese characters and romanization. The dictionary also contains the names of specific organizations, social movements, and key concepts, with appropriate romanization and English translations. We began the dictionary by entering all the relevant terms contained in a dictionary of postwar social movements of the left that Takazawa and his associates had previously published in Japan. Since the published dictionary contains information about variant names, nicknames, and other relationships among entries, this information was also entered and cross-linked in the database through a variants table.
All new names that arose during cataloguing as authors, editors, translators, or publishers, were first entered into the dictionary and then entered in the item’s cataloguing record through a “look-up” procedure. The system facilitated consistent, correct Japanese data entry, and also served as a built-in indexing system. Similarly, detailed information about publishers was entered once in both Japanese and romanization, and subsequently was accessed through a “look-up” list. Since many of the publishers are either social movement organizations or small publishers that specialize in left-wing materials, this procedure not only minimized errors in data entry, but provided automatic indexing by publisher.
In addition to developing an extensive set of linked tables to contain the data, Steinhoff created a series of data input forms for the students to use. These forms were preset with the correct language and font for each data entry field, so the coders could quickly switch from English or romanized Japanese to Japanese character fields. Publishers were recorded on the main data entry forms, but all authors, editors, translators and other named participants were entered into a separate authors table, using an authors data entry form. This facilitated the entry of multiple authors and of various types of authorship for a single item, while retaining formatting flexibility for later bibliographies and simultaneously generating author data for indexing. All data entry of names of authors, editors, and publishers on the various data entry forms was managed through “look-ups” to the database dictionary. The look-up was not directly to the dictionary table, but rather to a query that sorted the names in alphabetical order. The data entry forms allowed the cataloguer to scroll through an alphabetized listing of the names in the dictionary, displayed in both Japanese characters and romanization. Clicking on the desired name would enter the Dictionary ID number for that name into the record being catalogued. Similarly, we used a look-up list containing the Japanese and English year dates in order to record publication dates consistently, regardless of whether the item used the Japanese or English dates. Most of the forms also allowed the coder to move quickly to a separate data input form to add a new name to the dictionary or publisher list, or to add authors for a publication, and then return to the basic data entry form for the record they were working on.
The cataloguing system assigned a single, consecutive number to each item as it was catalogued, and that number also appears on an archival quality sticker affixed to the item (or for very fragile items, penciled on the item). The core table that assigns the item numbers sequentially also records the storage location of the item and its link to the outline bibliography and to any sub-collection, as well as the date it was catalogued. All detailed cataloguing information about the item that appears on separate tables is linked by the catalog item number.
Books, manuscripts, pamphlets, maps, and posters were catalogued as individual items. A separate table and input form was developed for each type of material, in order to include the necessary cataloguing fields specific to that type of material and to streamline the data input process. Whole folders of loose miscellaneous materials were also catalogued as items, but the folder record contains additional information about whether the folder contains specific types of material (such as pamphlets, handbills or clippings), and the number of materials within the folder. Artifacts, letters, and sets of photographs were catalogued with a single record at the box or set level, following standard practice for manuscript collections.
For serials, the basic data for the serial title was entered first on a table with its own separate number series. Thereafter, each issue was catalogued separately with its own item number and linked to the serial title’s number through a look-up to the serial title table. The issue records contain volume and date information. Special issue titles, which occur quite frequently and often highlight particular social movements or events, were also recorded. This same two-level system was also applied to audio-visual materials, clipping sets, legal documents, and handbill sets.
The information input into the relational database does not immediately produce a fully formatted bibliographic entry. The information is stored in several different tables of the database, linked through common numeric codes. This provides efficiency in cleaning data, and maximum flexibility for later manipulation of the data. We have used the database to produce bibliographic records in various formats and layouts, add annotations and keywords, generate indexes, and produce formatted data with special sorting codes for the database-driven website.
Prior to shipping the material to Honolulu in 120 boxes, Takazawa prepared an “Outline Bibliography” in Japanese describing in general the contents of the collection. Most materials from his own files were placed in numbered manila envelopes, and the outline contains a brief description of the content and an estimate of the number of items in each one. A number of boxes and some files were labeled “not organized” in the outline bibliography. The sub-collections of materials he obtained from other donors were also generally labeled “not sorted.” During two early visits, Takazawa sorted some of these materials into labeled manila envelopes, but he did not complete the task.
Takazawa recommended that we start the cataloguing with the books, since they would provide the broadest overview of the kinds of materials that are in the collection, and would also be the basic references users would consult. He personally organized the books, using his own sense of how they should be organized for what we then expected to be simply a printed bibliography with a fixed order. His plan included some materials that are not technically books, but are bibliographies and other general reference materials. These items were catalogued in the precise order and numeric sequence that Takazawa developed, which incorporates a basic organization by topic and time period but includes some misplaced or ambiguously placed items. A small number of books were either discovered later in the unsorted materials or were contributed by Takazawa in subsequent years. These were simply added to the end of the books as they were catalogued, with item numbers that no longer continued in the original sequence.
Once the books were catalogued, sorting and cataloguing proceeded simultaneously. Since we did not have sufficient work space to sort the entire collection at once, we continued to work through one genre of material at a time. We began work on the serials by producing a list of about 120 serial titles that were either listed in the outline bibliography or were found loose in two boxes labeled “magazines.” When we showed the list to Takazawa, thinking it was complete, he said that this was only a fraction of the serials and we would have to go through every envelope to find the rest. The students carried out a systematic inventory of all the envelopes and boxes, cataloguing and storing serial items and adding serial titles as they went. When the task was finished we had found nearly a thousand different serial titles. However, we then realized that although the database could now link every issue of a serial title, the issues were not properly stored together by title because of the incremental way they had been identified. There was no alternative except to reorganize all the serials by title into new storage boxes and re-enter the location information.
Takazawa had promised to come to Honolulu again to sort the pamphlets as he had the books, but he was unable to do so and we had to catalog the pamphlets and all other materials without his assistance. By this time it was also clear that the cataloguing of items in numeric sequence was not important, since we could easily create a new sort order in the database for any presentation of the material. We now had two organizational models: the single item model used for the books, and the two level model used for serials, in which individual issues or items were linked to a general title or set. We adapted these two models to catalogue all of the other types of materials. For the pamphlets, we ended up using both models in tandem.
About a third of the pamphlets were loose in boxes, and we catalogued these just as they came from the boxes. When we reported that we had found over 300 pamphlets, Takazawa again informed us that there were many more buried in the file envelopes. He thought there were about a thousand, and the total number we eventually found was 1,002. Although it had made sense to remove all the serials from the files and store them by serial title, we were more reluctant to separate pamphlets from the other materials in the same envelope, because of the topical and contextual relations among the materials that had been filed together originally. The solution was to treat each file envelope as a folder or series of folders of material on a particular topic. Pamphlets were left physically in the folder, but also catalogued independently as pamphlets, with their location in the folder carefully recorded. We also retained the original number from the outline bibliography on all material found in that envelope, including serial issues, so that it is still possible to reconstruct exactly what was in one envelope even if it now stored and filed in several places by type of material.
We were also beginning to identify and separate out additional types of material that required different cataloguing information and storage arrangements. These included clipping files (many of which were in the form of large scrapbooks or bound volumes), handbills (some of which were valuable collections on a specific topic), manuscripts by various authors, legal documents, letters, photographs, maps, posters, and artifacts. From this point on, one graduate student in library and information science became the chief sorter. She identified how the material should be classified, and then prepared it for cataloguing and storage before turning it over to another student to input the data into the computer.
Some of the categories of material were physically distinctive and posted no great sorting problems, but others were ambiguous. In these hand-produced ephemeral materials, the distinctions between a handbill, a pamphlet, a manuscript, and a serial were not always clear, but we developed basic working definitions and in a few cases cross-catalogued materials. At the crudest level, a pamphlet had to have a cover and title, and it usually had a price. A serial had to have some kind of a date and clear evidence that it was intended for serial production, even if the collection only had one issue and we could not be sure whether any more were ever produced. A manuscript had to have one or more authors and a title if possible, and it should seem to be some sort of a statement, essay, or other literary product. Anything that appeared to have been produced for direct distribution or posting in public places was treated as a handbill. Material that did not quite fit these categories was simply left in its topical folder, including isolated newspaper clippings and handbills that did not constitute a significant “set.”
The sorting became even more difficult when we had finished the material that was pre-sorted into labeled manila envelopes and moved into miscellaneous boxes of unsorted material, including sub-collections from other donors. We tried to sort these materials into topical or chronological sets as much as possible, but some folders still contain only minimal identification. As people began to use the collection materials, it also became apparent that the label on an envelope or the listing in the outline bibliography did not always fit every item inside. Following standard practice for manuscript collections, we have not tried to re-sort these materials because they reflect the categories created by the donor.
Teams of graduate students with educated native Japanese reading ability and non-natives with advanced Japanese reading skills, who were trained by Steinhoff and advised by library professionals Cartwright and Springer, catalogued and processed the materials. In addition to entering cataloguing information into the computer in Japanese, romanized Japanese, and English, the students had to prepare the items for storage. This included labeling the item with the catalog item number assigned by the computer and putting it into an appropriate archival storage container, whose number and type were also recorded in the item record in the database. Most materials other than books were placed in archival folders and boxes, and some fragile materials were wrapped completely in acid-free and lignum-free paper before being placed in archival folders and boxes.
The teams were organized for efficient and accurate entry of the Japanese and romanization, but with the secondary aim of providing educational opportunities for advanced students of Japanese. While native Japanese graduate students with college education in Japan are more efficient than non-native speakers at Japanese word-processing, cataloguing the Takazawa Collection provided a rare opportunity for advanced students of Japanese to work with natural language materials and improve their reading and character recognition skills. We therefore used teams that maximized both efficiency and learning opportunity.
For books and pamphlets the team included one native Japanese speaker to handle the Japanese data entry, and one non-native speaker with advanced Japanese language skills to handle the physical care and placement of the item. For serials, because the Japanese was entered once for the serial title and the cataloguing of individual issues required recognition but not production of Japanese, we were able to use a ratio of two non-native speakers to one native speaker. In this case the native speaker worked independently to enter serial titles in advance, and then two non-native students entered the issues and handled the physical care and placement of the items. As we moved to cataloguing more specialized materials, the work required two native Japanese speakers: one to sort the material into appropriate categories (eg. manuscript, handbill, folder, legal document or letter) and prepare the materials for storage, and a second to enter the data into the computer. Non-native speakers then worked on other preservation tasks that required Japanese reading ability, but not input of Japanese data or writing Japanese labels by hand.
The database was backed up after every data entry session using five rotating sets, first of floppy disks and then Zip disks as the database grew larger. Initially we also used a weekly tape back-up, but the Zip disks were faster and easier to use. Steinhoff frequently moved copies of the current database to another computer in order to program new elements or prepare reports, which provided another level of backup protection at a different site. After the data for one type of collection item was entered, the printout was checked against the original materials to ensure accuracy in the data entry and in the physical labeling and placement of the materials. The errors were then corrected in the database.
Many items required special archival preservation, which was carried out under the direction of University library specialists. Under the guidance of the University music library’s audiovisual technician Alexis Weatherl, we made one or more use copies of audio-visual materials (phonograph records, audio tapes, and video tapes) in order to protect the original. For sound recordings and audio tapes a master preservation copy was made on high quality open-reel tape. Photographs and stickers were placed individually in archival polyethylene sleeves for protection and then stored in archival boxes. Maps and posters were stored flat in acid-free and lignin-free folders in archival newspaper boxes. Textile artifacts (demonstration banners and headbands) were carefully wrapped in acid-free and lignin-free tissue and stored flat in archival boxes. When Steinhoff wanted to use some large-format organizational newspapers from the 1960s for a research project, it immediately became apparent that the originals were too fragile for photocopying and intensive use. We then identified a number of newspapers for preservation microfilming, which was carried out under the supervision of University preservationist Lynn Davis using a local vendor who had been trained to handle Asian language materials. Other text materials that Takazawa had identified as particularly rare and requiring preservation were photocopied on acid-free and lignin-free paper to safeguard the originals. The University’s preservation department also encapsulated one particularly valuable set of handbills from the 1960 Ampo period in Mylar, so that they can be used without damaging the fragile originals.
From the outset of the project, we planned to produce annotations for the bibliographies to make the specialized materials more accessible to potential users. Takazawa also foresaw the need to associate keywords with the items in order to produce the indexes for the printed bibliographies. Steinhoff developed the initial system for annotating and keywording the entries as the book cataloguing neared completion. Using a series of queries to bring together and format all the necessary elements from several different tables, she programmed a bibliographic entry format in both Japanese characters and romanization. A new data entry form was then produced that would display the bilingual bibliographic entry, with space below it for entering the English annotation. She created a new table to collect the keywords and an associated data entry form. By embedding the keyword form on the same form used for annotations, the item number of the book could be entered automatically for each keyword entry. The keyword entry form permitted multiple entries per book, with several displayed on the form at one time.
Steinhoff and Takazawa produced the initial book annotations together in Tokyo, using a printout of the bibliography and a laptop computer displaying the annotation and keyword entry form. Takazawa read the printout and dictated in Japanese the general content of the annotation and the necessary keywords. Steinhoff listened to the Japanese and entered the annotations and keywords into the database in English. The system worked because Takazawa was completely familiar with the books he had donated to the collection. He could instantly supply context and background in the annotations and specify keywords to link the item to people, organizations, and events. The method was efficient because any uncertainties could be cleared up immediately, and the interaction encouraged Takazawa to add interesting tidbits of background information (which he sometimes later edited out!). It also gave Takazawa an opportunity to proofread the bibliography and correct some data errors. However, he soon tired of waiting while the keywords were entered and to save time, tried to ensure that the annotations were clear enough that Steinhoff could enter keywords later.
This system was adapted some time later to enter annotations and keywords for the serial titles, but it quickly became apparent that Takazawa could not recall the serial titles as well as he could the books, without physically seeing and handling them. The task was set aside with the expectation that Takazawa would be able to work on it when he came to Honolulu again. Unfortunately, his trip was postponed after he won the Kōdansha Non-Fiction Prize in 1999 for his book Shukumei: Yodogo Bōmeishatachi no Himitsu Kosaku. Just before he was to come to Honolulu the following year he suffered a stroke. By that time plans were well underway to produce a website to display the collection bibliographies, so Steinhoff had digital pictures taken of one cover from each serial in the collection, with the dual aim of posting the covers on the website and taking copies to Japan to jog Takazawa’s memory for the annotations. He was still too ill that summer to work on the annotations, so Steinhoff reluctantly realized that she would have to do the annotations and keywords herself, with help from the students working in the collection and other Japanese who were familiar with the materials.
After having produced the book annotations with Takazawa and worked with the collection materials for several years, Steinhoff could produce annotations and keywords, but not as well nor as quickly as Takazawa could. She has had to rely much more on internal evidence in the materials themselves, and on other published sources. Gomi Masahiko, the founder of the Mosakusha Bookstore that distributes such materials in Shinjuku Sanchome, Tokyo, has helped by identifying people and organizations associated with some of the more obscure serials (including those he himself donated to the collection). The student cataloguers assisted by entering brief descriptions as they catalogued the pamphlets, but without knowing the names and context of the social movements, they could not produce the kind of annotations and keywords that would make the materials accessible to users. Fortunately, as more of the materials in the collection were catalogued, clues found in one type of material could help with the annotation of related items in another part of the collection. It is particularly important that the annotations and keywords make these links to assist future users who will have no direct knowledge of the movements they are studying.
As Steinhoff began to work more intensively with annotations and keywords, some new elements needed to be programmed to make the keyword system work efficiently and effectively. The initial system developed to enter keywords for the books had simply captured the keyword terms for each item as they were entered. Since Steinhoff was entering them on the fly while listening to Takazawa’s dictation in Japanese, they had been entered in a jumble of romanization and English. There was at that point no method for standardizing the keyword entries and linking them to the dictionary. However, the intent from the beginning was to link the keyword entries to the existing Dictionary Table in Access, which already contained every author’s name plus a number of organization names and keyword topics that had been entered at the beginning of the project, and a field to enter the Dictionary ID had already been programmed into the table that accepted the keyword references for each item. By using a query to sort the keyword entries alphabetically, the sorted list could be compared with the alphabetically sorted Dictionary query to find the correct dictionary numbers for the keyword entries and enter them. Alphabetical sorting streamlined the process by clustering all the entries for the same term, as long as they had been entered in the same way and not misspelled. Variants of the same term could be identified and coded, but some of the keyword entries were new names or terms that were not yet in the dictionary.
The dictionary was in active use for cataloguing and new entries were being added continuously as the coders needed to record names that were not already in the dictionary. Hence the first problem was to develop keyword terms that would eventually be part of the same dictionary, and assign them Dictionary ID numbers, but not interfere with the cataloguing. Since the key to the entire dictionary system was the integrity of its ID numbers, both the dictionary numbers used in the cataloguing and the new ones assigned to keywords had to be completely stable. Access has an extremely rigid formula for the automatic assignment of ID numbers to prevent duplicate numbering. The cataloguing was carried out in an area without network connections, while Steinhoff was doing annotations and keywords in a different building or at home, so there was no way to link the computers into a network and enter new dictionary terms from both activities into the same copy of the database. Under normal circumstances, there would be no way to create an independent stream of new dictionary entries on a different copy of the database and retain those numbers when the two sets of data were merged.
Serendipity intervened. It turned out that during a major upgrade of the database program and operating system and a corresponding switch to a different computer, the master dictionary table had reset itself to a higher number series. Consequently, there was a large gap in the ID numbers listed in the dictionary, and when the coders doing the cataloguing added a new name to the dictionary, it was automatically assigned a number in the new series. There were several thousand unassigned numbers in the middle of the series which could be used to add new keywords. Steinhoff created an empty version of the Dictionary Table structure, but with ID numbers to be assigned manually instead of automatically as was the case with other tables in the database. She created an entry form for adding new keywords to the dictionary and set the first entry to begin with the first of the missing numbers in the original Dictionary Table. She then entered all of the new keywords that appeared to warrant separate entries, assigning them sequential Dictionary ID numbers in the vacant number series. When these records were appended to a copy of the original Dictionary Table, Access accepted their newly assigned ID numbers because those numbers had theoretically never been assigned—they had just been skipped over.
Another small problem arose with the coding of keywords for serials and other materials that had been catalogued in a two-tier system. The first tier representing the serial title (or the set for certain other types of materials) had been catalogued using a separate number series, while the individual serial issues (or other units) had been catalogued with individual Item ID numbers. However, it was primarily the serial titles or set titles that were being assigned keywords, not the individual issues or items within the sets. Each series was automatically numbered starting with 1, so the series numbers all overlapped, and also overlapped with item ID numbers. The solution was to assign new “dummy” item ID numbers to the serial titles or sets, using a distinct number series that started well above the numbers that were being assigned to individual items. These dummy ID numbers were assigned in blocks for each type of material with two-tier cataloguing, using numbers in the 50,000 or 60,000 range. The assignment was done globally for each set, using a query with an expression that added 50,000, 61,000, or whatever starting number had been assigned to that block, to the existing series number for the record. Thus a serial title with the Serial ID 502 would have 50,000 added to it in the expression to produce the dummy Item ID number 50502. The query was then used to make a new table, after which the expression field was renamed as the Item ID. This process was generally combined with the use of queries and expressions to created new fields with the formatted bibliographic data in kanji and romanization, which also had to be made permanent by creating a new version of the table. When the data entry form for the addition of annotations and keywords was created from the new table, it would display the formatted bibliographic entry in both languages along with the dummy ID number, and the dummy ID would automatically be entered on each keyword data record.
The next step was to enhance the keyword entry form with a look-up to the alphabetically sorted dictionary query, so that one could find the term in the master dictionary while viewing the full entry for a particular item, and click on it to enter the dictionary number into the record. The space for writing in keywords was left on the form, but now only needed to be used when the appropriate keyword was not yet in the dictionary. New terms could be added to the dictionary periodically, using a variation of the procedures already developed. By setting the sorted query of keyword entries for individual items to display only those for which the Dictionary ID field was blank, it was a simple matter to determine which new keywords needed to be added to the dictionary, to enter them and assign new Dictionary ID numbers, and then to add the new Dictionary ID codes to the keyword entries.
It did not matter if the table containing the keyword entries for particular items contained the actual text of the keyword, as long as it contained the correct Dictionary ID number and associated it with the Item ID number. These two numbers constituted the indexing link between an item in the collection and a particular keyword reference. Since all Dictionary entries were already coded by type into categories such as Japanese person’s name, foreign person’s name, organization name, publication name, concept, social movement, or action, the terms could readily be separated into different indexes as needed. In the course of adding keywords, some new categories of dictionary type were created for historical time period, geographic unit, and genre to assist researchers looking for materials that would cut across the other dictionary types.
Handling Multilingual Data Issues
The question at the outset of the project in 1993 was how best to accomplish the goals of cataloguing and of publishing bibliographies of the Takazawa Collection, with a very limited budget and mainly student labor. Four considerations were paramount in the initial selection of a system for cataloguing the collection.
- Need for a Relational Database. We planned to catalog the collection in a stand-alone system, but to meet American library cataloguing standards as much as possible. This consideration placed the emphasis on library cataloguing as the primary activity, with production of bibliographies as a secondary goal. Steinhoff had extensive experience using relational databases on personal computers for data processing of complex surveys and the desktop production of directories for publication from the data. The Takazawa Collection project was a similar application, although it posed additional language problems. The project’s first requirement was therefore to have a full-featured relational database program to manage the data. At the same time, we did not want a program so complex that the project would be completely dependent on a professional database programmer. In addition to the financial considerations, the application would require considerable familiarity with Japanese, which would severely limit the pool of qualified programmers.
- Need for Japanese characters. Both American cataloguing standards for Japanese materials and the bibliographies we planned to publish would require bilingual entries in Japanese and English. The materials were written in Japanese, but the collection was located in an American environment and the publications would be aimed primarily at an American academic audience. We intended to include Japanese characters in both the catalog records and the published bibliographies, so we needed a system that could handle the Japanese character input into a relational database program and provide printed Japanese output.
- Need for romanization with macron vowels. To meet American standards and the needs of the American audience, the catalog records and published bibliographies also required parallel romanized Japanese entries to provide the readings of the Japanese character text, and English annotations. There are several different systems for rendering Japanese phonetically into western alphabets, most of which rely on different spelling conventions. However, the American academic publishing standards for Japanese bibliographic entries, as specified in The Chicago Manual of Style call for the Hepburn system of romanization as it is used in Kenkyūsha’s New Japanese-English Dictionary. Use of this romanization system ensured that scholars would have correct bibliographic information for publication purposes for all items in the Takazawa Collection. Instead of special spelling conventions, this romanization system uses macrons over vowels to represent the elongated vowel sounds of Japanese. Hence our system also needed to be able to input and print out macron vowels.
- Need for an American system for service and support. A fourth important consideration was that since the project was to be conducted in an American university, which would own the computer and be responsible for its repair, technical support, and upgrading, the computer and software need to be models that could be serviced and supported easily in Honolulu. That consideration ruled out the use of Japanese computers and software, which at that time used proprietary systems that were not compatible with American ones. In addition, using a Japanese system at that time would have meant that all the hardware and software manuals as well as the computer interface would be in Japanese rather than English.
These considerations channeled our initial choice of hardware and software, which in turn have constrained all subsequent decisions. Because macrons are not part of the standard English character set for personal computers and printers, our application was not just bilingual in Japanese and English, but multilingual. The computer and relational database program selected had to be able to accommodate Japanese characters, English text, and macron vowels within the same database, with relatively simple keyboard transitions from one to another to facilitate rapid data entry. Moreover, the computer system needed to fit into an American academic library environment and be capable of growing and changing with new technological developments that could not be foreseen in the early 1990s.
The four primary considerations meant that we were seeking a desktop computer sold in the American market, with full-featured relational database software and appropriate language capability to handle both Japanese scripts and macron vowels. We hoped to find both hardware and software from companies likely to continue to support these products, so that the system we selected would be upgradeable and would not become a dead-end solution. Even after Japanese products had been ruled out, that left both Macintosh and IBM-type personal computer systems as reasonable alternatives. The choice narrowed to which of the two could provide more appropriate software to meet the relational database and language requirements of the project. There were a number of products available for Japanese word-processing, but finding a relational database program that could accept Japanese, and an input method to produce the Japanese, posed more difficult problems.
The Macintosh had excellent language capability through the Japanese Language Pack, a product sold by Apple Computer that enhanced the operating system and was designed to make Macintosh computers competitive in the Japanese market. However, in early 1993 there was not a good relational database software product available for the Macintosh that would meet the project’s needs. The available consumer-level products were not fully relational, and the high-end relational database products required a professional programmer.
In contrast, IBM-type personal computers were not as well-equipped to handle Japanese and macron vowels, but good relational database software was available. Microsoft had just released version 1.0 of its Access relational database program, which was designed to be extremely easy for even non-programmers to use. It worked within the Microsoft Windows environment and was designed to integrate smoothly with Microsoft’s popular word-processing software, MS Word. Steinhoff had previously worked with professional database programmers on projects, but was not a programmer herself. Since MS Access had just been released, there were no “Access programmers” available. However, the program proved to be so user-friendly that Steinhoff could program Access herself to meet the project’s needs. The question was whether suitable Japanese language software was available that would work with Access.
The Japanese writing system uses a combination of two parallel phonetic scripts plus the Japanese version of Chinese characters, called kanji, which usually occur in two-character combinations to form words. Writing on a computer in modern Japanese requires 52 hiragana script characters, 52 katakana script characters and a minimum of about 2,000 different kanji characters.
Personal computers use a relatively standard coding system to represent the letters, numbers, and other symbols needed. When a keyboard key or key combination is pressed, the computer registers the information about the corresponding location in the code and stores the code information. The computer’s software then uses sets of fonts keyed to that code to display the coded character on the screen or on a printed page. Usually the fonts simply display the standard characters in different typefaces, but they can also be used to produce completely different representations, including all sorts of graphic symbols, so long as the user knows what will be produced when a certain key or key combination is pressed on the keyboard.
Until the development of Unicode in the late 1990s, personal computers worked with a limited set of “single-byte” codes that represented the row and column location of 256 cells in a two-dimensional matrix. As long as the necessary characters and symbols would fit within the standard matrix, different countries could develop their own “code pages” so that a slightly different keyboard would produce the characters needed for a particular language and the fonts for that language would display the expected characters.
Japanese and other Chinese character-based languages require a much greater number of different characters and symbols, which would not fit within the standard matrix for one-byte language encoding. Systems were developed that used two-byte codes, which could accommodate a much greater number of characters using the same code notation and the same basic matrix. With software that recognized the two-byte codes and fonts containing the characters they represented, it became possible to produce full Japanese text using hiragana, katakana, and kanji on a standard western keyboard. The process is more complex than straightforward English typing, because of the very complicated Japanese writing system. The user first types in letter combinations on the regular keyboard that correspond to the hiragana script characters, and the hiragana appears on the screen. Tapping the space bar brings up possible Japanese characters that match the phonetic script. When the correct character combination is found, the user presses the enter key and the hiragana is transformed into the desired kanji. Japanese personal computers had such systems built in, but initially they were not available for American computers except through a few proprietary systems for word-processing.
Just as the decisions were being made for the Takazawa Collection project, a small company called Twinbridge, which originally produced Chinese language software designed to work in any English Windows program, released a Japanese version of its software. The Twinbridge software employed the standard system for encoding Japanese characters based on the Japanese “code page” and the two byte codes. It could be set for JIS code or a very similar alternative called Shift-JIS. Once the Twinbridge system was installed on a computer using the Windows operating system, it could be opened as an application that was available when needed. From anywhere in another Windows application that required Japanese text, the user could call up the Twinbridge input window with a hotkey combination. Thereafter, key combinations on the English keyboard that matched the romanization of the Japanese phonetic hiragana syllabary would be reproduced in the Twinbridge window in hiragana. The space bar was used to select kanji characters, and pressing the enter key moved the text from the Twinbridge window into the main application. If the application was set to a Twinbridge Japanese font, the text would appear in Japanese. Twinbridge could then be toggled off to return to English text entry.
The combination of MS Access and Twinbridge Japanese offered a powerful relational database that could handle both English and Japanese. Standard Japanese software and fonts also contain most of the characters in the English character set. Because the two-byte coding space is so much larger, there is also room in it for the one-byte English characters in their normal code location. However, since the Japanese fonts are designed for the occasional English word to be written in a line of Japanese text, they are not proportioned well for extended English text. We therefore used regular the English keyboard and fonts for the English text.
Neither the English character set included in the Japanese code page nor the standard English character set in American personal computers includes macron vowels. They cannot be entered directly from the keyboard to produce the Hepburn romanization of Japanese, but it is still possible to generate macrons by using a font that substitutes the macron vowels for some other characters in the set. Initially the problem of macron vowels was handled with a stopgap solution that Steinhoff had used on previous projects. An asterisk was inserted after the vowel as a flag, so that at a later time the asterisked vowels could be converted to macron vowels with a special routine and a set of replacement fonts for macron vowels.
About a year into the project, however, Steinhoff discovered that a small local company in Honolulu called Hawaiian Graphics had created a set of special fonts to generate the diacritical marks used in the Hawaiian language, which happened to include all of the macron vowels needed for Japanese. Hawaiian Graphics fonts were complete English fonts that resembled standard computer fonts, but they replaced the umlaut characters in the standard English character set with macrons. The program included an input method licensed from another small company in Quebec, which enabled the user to input any macron vowel by pressing the ALT key and then the vowel. If any regular font was used, the character appeared as a vowel with an umlaut over it. If any of the proprietary Hawaiian Graphics fonts were used, the vowels appeared on screen and also printed as macron vowels.
This three part system of the MS Access relational database, Twinbridge Japanese for Japanese language input and display, and Hawaiian Graphics for macron vowels met the project’s needs for a number of years. We moved smoothly through several software upgrades and one of hardware. By the late 1990s, however, the limitations of using proprietary programs to produce Japanese characters and macron vowels were becoming apparent. The system worked fine for stand-alone computer use, but the files could not be read on computers that did not have the same language processing and font software. The growing practice of transferring files by disk or e-mail attachment rather than printed copy was rapidly rendering the system obsolete. This was of course a much more general problem of technology utilization in an increasingly borderless world, and the basis for its solution was the development of Unicode as a system of encoding all the letters, symbols, and characters required for all of the world’s languages in one massive code table, rather than substituting one language’s code page for another in the same limited space.
Developing the Multilingual Website
With a multilingual, database-based website, it would not be possible to use a conventional search engine to find related items, for several reasons. First, search engines are generally not able to search interactive websites in which the content is not constantly present in html pages, but instead is stored in a database and called up selectively at the user’s request, because the search engine can only index html page content and not the underlying database content.
Second, we judged that the use of a normal free text search box would be unworkable for our multilingual database even if a search engine was able to index all the material. The material is in English, Japanese, and one specific variant of romanized Japanese. Although Japanese search engines can search both Japanese and English text, American search engines generally do not have the capacity to search in Japanese, and neither could handle our macron vowels. We could not predict which language a user would want to use to input the request, let alone which romanization system. A search request using a romanization with different spelling would not produce the desired result, even if the material was available. Most users would not be able to input requests using the same romanization system as the database, because they would not have any way to enter the macron vowels even if the search engine could recognize and find them. And generating all the spelling variations for different romanization systems for every possible term in the database would be a formidable, unproductive task. Moreover, the appropriate words would not necessarily appear directly in the bibliographic entry.
Effective search tools for this website would require human intervention with some knowledge of the materials and their underlying association: in short, an organized indexing system using controlled vocabulary. The elements of an indexing system were already built into the database through the keywords and other references keyed to the Dictionary Table. All authors and the personal names of some publishers were already linked to the Dictionary Table, and supplemented with keyword references to the same names. Keywords for organizations and a variety of other terms were already in the dictionary and linked to relevant items. All publishers were listed in a separate table, with a link to every relevant item. The problem was simply to devise an efficient system of indexes from these elements so that users of the website could use the terms to find specific items in the collection.
We built a simple, straightforward three-level indexing system, with first level retrieval based on the bilingual kana and romanization sort system that had already been developed for browsing the bibliographies. Publishers were already listed in a separate table and thus constituted a separate index. We divided the dictionary into several components and created separate indexes by Japanese name, non-Japanese name, organization, and keyword, to reduce the number of items that would be retrieved from the selection of a letter or kana character. For organizations and keywords the dictionary contained an English translation of the term or name, so we added a separate sort based on the English words.
The first level of the index system includes the selection menu on which the user selects the index and then picks a letter or character from one of the available lines. This brings up a list of all the terms associated with that letter or character, sorted in the appropriate order. The database table from which the list is derived was constructed from fields in the Dictionary Table or Publishers Table. The first letter and first kana character had already been added to each record in these tables. In a query sorted on the hiragana name field, the kana sort order is then added manually. The display is formatted so that all of the available versions of the term (kanji, rōmaji, and English if available) appear across the page, but the items listed and their sort order vary depending upon how the selection was made. A user looking down the appropriate column will find the items sorted in the correct order, but can also read across the other columns to find the corresponding language entries for the same term. In this way, the first level of the indexes serves also as a pronouncing guide for the kanji and as a bilingual glossary when the English translation is available. Clicking on the underlined kanji for any term brings up the second level index.
The second level of the index system displays a list of abbreviated entries for all of the items in the collection that are linked to the selected name or keyword. The display of this link list gives the title of the item in kanji and romanization plus its Item ID and the type of item it is (book, serial, etc.). For names it also shows the type of reference, so the user can see whether the listing is for an author, an editor or translator, or is a keyword reference to the item. For the organization index and the two name indexes, the link list data tables for the second level index were constructed by combining the relevant records from the Authors Table (which records all types of authorship participation by individuals and organizations for an item) with records from the keyword table associated with the names of persons and organizations by their dictionary numbers. For the keyword or publishers index the link list records come from a single source. In each case, however, the link list tables were built incrementally, by using a query to identify the records for a specific type of material that link to the Dictionary (or Publisher) ID numbers on the first level index list, adding any necessary identifiers in both Japanese and English from a look-up table, and then appending the records to the link list for that particular index. In this manner the index lists are able to display entries from all the different types of materials in the collection, despite the fact that they were originally catalogued into separate tables with a different array of fields. The prior assignment of dummy ID numbers for serial titles and other materials with two-tier cataloguing ensured that every item being indexed had a unique Item ID number, but care had to be taken to include the dummy Item ID field when creating link lists from multiple sources, even if the original link between tables was based on its series number, as in the publisher field for a serial title.
Clicking on the underlined ID number link of an item displayed in the second level abbreviated list of items brings up the full bibliographic display and annotation for that item, as it appears in the bibliography. The programming for this part of the index uses both the item type field to determine which bibliography format to use for the display, and the item ID to find the correct item in that bibliography.