Backgtound

Tsugio Sekiguchi, a Japanese linguist of German Grammar, wrote gigantic manuscript of 25000 pages for 30 years. He collected examples of sentences. Thanks to his first son, Ikuya, the manuscript is stored in Keio University, Hamamatsu University School of Medicine and Osaka Univesity. However, all the researchers who were in charge of the management of manuscript have been retired. It seems that no one use it. Some examples of the manuscript were open public at the web-site of Cyber Media Center, Osaka University before, but it's unavailable now.

I happened to get all the digitalized data of Manuscript which is stored in Osaka University, therefore I would like to try to decipher and make it open public as my social responsibility. The manuscript in Osaka University is not original, but copied one.

Work Record

2000Manuscript was stored in Osaka University and some of them was digitalized.
April 2018I got the digitalized data.
May 2018Start to decipher the index
January 2019Decipher of index was finished, start to upload to the website.
July 2021Convert all image files from TIFF to gif.
August 2021Deployed the website and all manuscripts

Digitalization of Index

It is a tough job to make it open public on the web. Although all the pages are digitalized, but we only have image files (TIF). Of course, it's better than nothing, because there are 25000 pages! TIF was the best format when they were scanned.

All index pages were written in word processor! So there's no ascii data. I used great OCR software of Google, "Tesseract." However, the index is written in several languages. It was a difficult point.

Since Sekiguchi used old style Japanese character as well which Tesseract cannot read. So I wrote replacing program by python. I worked on it every weekday and finished after 6 months or more.

As of August 2021, the TIFF format cannot be displayed on the website as it is. As of August 2021, TIFF format cannot be displayed on the website as it is, and when it was previously available on the Osaka University Cybermedia Center page, it was probably displayed via an application using C#, but since adding an application slows down the behavior, and more importantly, I don't have the skills to do that, I meekly converted it to GIF. Since there are several thousand image files alone, I used an automatic conversion software called Image Tuner, but it took a month.I tried to build it with a static site generator, but the number of files was too large and I failed. It took me three days to finish it. Hugo seems to be able to do it, but my technical skills are not up to the task.

Digitalization of Main Contents

To tell the truth, index pages were not written by Sekiguchi, some researcher wrote to copyedit. I hadn't reached main contents yet.

Sekiguchi used many styles, not only typewritter, but also Sütterlin or Fraktur. I can read both fonts because I learnt German by the textbook written by Sekiguchi, but generally speaking, it's very difficult for modern person. Japanese language was also written in old style, I can read it without hesitation since I learnt by the textbook written by Sekiguchi, published before WW2, but not easy for modern Japanese.

Since there are 25000 pages, I digitalized 1 page per a day, it needs 68 years. My life's deadline will come before I finish my work.

A page of the manuscript

Plan from Now on

I will move ahead slowly. Please wait.

Translation

I translated all the pages by myself at the beginning, but I rather use Google Translate, DeepL, Mirai honyaku and so on. I am asking for your understandings.

In the Near Future

I am doing a lot of trial and error. One of the things I've been doing since I knew about Digital Humanities is to reassemble the reprinted text in xml format that follows TEI (Text Encoding Initiative) and make it visible as appropriate (Test Page). The other is a digital collection method that uses a viewer called Opseadragon to load the image and show the reprinted text in the space next to it.(Test Page). I am also thinking of loading images with Openseadragon and putting text in xml format that follows TEI (Text Encoding Initiative). I have not yet achieved this due to lack of technology, but I hope to make a big change in the near future. I would like to at least include Openseadragon; how to present the text in xml format and how to create a database on a static site without loading all the images at once are the issues at hand. (June 5th, 2024 added)

In the Future

No one will manage after I dead. Therefore, I hope I could publish as a book which I will edit by Tex or something. Or someone could take over my work.