Current Awareness-E
No.411 2021.04.22
E2372
2020 NDL Digital Library Cafe
Electronic Information Department Electronic Information Distribution Division, Ryoka Suzuki (Haruka Suzuki), Michiko Takahashi (Michiko Takahashi)
On December 10, 2020 and January 15, 2021, the National Diet Library (NDL) held the 2020 NDL Digital Library Cafe. This event is a lecture for the general public that sets themes on research and the latest trends related to digital libraries, and invites experts in the field as lecturers to have fun discussions (see E2081). This was the first time that the event was held online, and about 20 people participated each time, including participation from afar.
● The 1st "Utilization and Challenges of Web Archives: From WARP and Domestic and Foreign Cases"
The first theme was "Utilization and Issues of Web Archives: From WARP and Cases in Japan and Overseas", and NDL staff introduced the Web Archiving Project (WARP) and cases of providing web archives in Japan and overseas. After receiving topics from Mr. Kunihiko Ueshima of Japan Data Exchange Co., Ltd. and Mr. Masayuki Asahara of the National Institute for Japanese Language and Language Corpus Development Center, we had a discussion with the participants.
As the current status and development of WARP from the NDL Kansai-kan Electronic Library Division, the recent expansion of collection targets, including the collection of private websites with permission, and the institutional archive, which is an example of the use of WARP's persistent identifier (PID), etc. I introduced it. In addition, it is said that the issues of web archiving are changing from the aspect of technological development to the aspect of utilization in academic research (see CA1893), and as an example of dealing with broken links and content changes in cited documents, the cited document storage service Perma.cc in the legal field. As an example of providing an easy-to-use secondary data set, the UK Web Archive of the British Library (BL), and as a development project related to a data set creation tool, the Internet Archive and researchers belonging to Canadian universities are playing a central role. Introduced the Archives Unleashed project.
Regarding the market value of web archives, Mr. Ueshima stated that in addition to the public data that is the final product, the data created in each process of target data selection, collection, organization, and storage have different utility values. He also described WARP as having abundant data content, and stated that there is room for development in expanding the types of data sets to be provided and providing custom-made aggregates that are not currently being provided.Mr. Asahara stated that the usefulness of WARP in academic research lies in the quality maintenance by controlling the collection target and the provision of large-scale text data, based on the development experience of "Kokugoken Japanese Web Corpus". He also commented that the "Parliamentary Minutes Search System" (see E2240) is often used in linguistic research research, and that continued archiving will be important data for future linguistic researchers.
In the discussion, in order to expand the use of data, we will provide a general-purpose open data set that can be used in a variety of ways regardless of trends and current affairs, and Japan's ten to metadata so that it can be used for integrated analysis of huge and diverse data. There was an opinion that it is effective to create and publish a correspondence table between standard vocabulary such as Data Catalog Vocabulary (DCAT) and output items from WARP.
● The 2nd "New Year Project: Humanities in 2021"
The second theme is "New Year's Project: Humanities in 2021", Yuta Hashimoto of the National Museum of Japanese History (hereinafter referred to as "History"), Naoki Kofu of Chiba University, Akihiro Kameda of Chiba University, Saga University After the introduction of the efforts by Mr. Natsuko Yoshiga, we had a discussion with the participants.
Mr. Hashimoto picked up "Reprinting Together" (see E2353). By supporting IIIF (see CA1989) in 2019 and improving interoperability with digital archives, it will lead to the provision of materials from the regional material archive to "Reprint with everyone" and access to overseas materials related to Japan. It was shown that there were various positive effects, such as an increase in the possibility of being able to do it.Mr. Kofu introduced a text conversion project for "Engi-shiki" using the Text Encoding Initiative (TEI). This is an attempt to convert the ancient Japanese administrative historical material "Engi-shiki" into mechanically analyzable data by converting it into text and marking it up with TEI. While it will be possible to verify previous research and conduct new research using the obtained data, securing specialized knowledge and manpower, and overall management will be issues for implementing similar projects. I raised that it would be.
Mr. Kameda introduced the possibility of utilizing Wikidata, which is Linked Data (see CA1746), by taking the cooperation between khirin, which is a research database of Japanese History, and Wikidata as an example. Wikidata concludes that it is useful as a hub for connecting various data, although it is necessary to reserve reliability and devise a design when linking databases to ensure persistence.
The "Ogi Domain Diary Database" that Mr. Yoshiga was involved in constructing has been converted into data focusing on the "Diary Catalog", which is a summary of the domain business diary of the Edo period. , It is possible to search by catalog text or keywords. It was a big issue to extract the vocabulary specific to the region and the era that often appear in the text as a search keyword, but he introduced that it was solved by the enthusiastic participation of local citizens who can read local materials.
In the discussion, they shared the problem that it was difficult to obtain Japanese text data regardless of whether it was a modern sentence or a historical document, and the experience that cooperation with Wikidata and time information analysis software HuTime was useful. In addition, many opinions on project management are introduced, such as it is easy to make a project by narrowing down the scope of the target materials, it is important to secure manpower and expertise, and it is also important to develop human resources who can participate in the project and supervise the entire project. Was done.
Throughout the discussion, many collaborations between data providers, experts, and citizens who use the data were discussed, and it was an opportunity to realize the importance of creating a system that connects "people" in their respective positions.
Ref: “2020“ NDL Digital Library Cafe ”. NDL Lab.https://lab.ndl.go.jp/event/digicafe2020/“ National Diet Library Data URI ”. NDL.https: // www. ndl.go.jp/jp/dlib/standards/lod/uri.htmlPerma.cc.https://perma.cc/ “More than 9 million broken links on Wikipedia are now rescued”. Internet Archive Blogs. 2018-10- 01.http://blog.archive.org/2018/10/01/more-than-9-million-broken-links-on-wikipedia-are-now-rescued/UK Web Archive.https: // www. webarchive.org.uk/ukwa/The Archives Unleashed Project.https://archivesunleashed.org/ “Use of custom-made aggregates”. Statistics Bureau, Ministry of Internal Affairs and Communications. Https://www.stat.go.jp/info/tokumei/ order.html National Diet Library Japanese Web Corpus. Https://bonten.ninjal.ac.jp/ “Data Catalog Vocabulary (DCAT) --Version 2”. W3C.https: //www.w3.org/TR/vocab- dcat / Naoki Kofu, Makoto Goto. Application of TEI to "Enki-shiki" and sharing and distribution of text data of Japanese history materials. National Diet Library Research Report. 2019, (218), p. 315-327. https: / /www.rekihaku.ac.jp/outline/publication/ronbun/ronbun9/pdf/218005.pdfText Encoding Initiative.https://tei-c.org/Wikidata.https://www.wikidata.org/wiki/Wikidata : Main_Pagekhirin.https://khirin-ld.rekihaku.ac.jp/ Koshiro Clan Diary Database. Https: //crch.dl.saga-u.ac.jp/nikki/HuTime.http://www.hutime.jp/ Toru Aoike. 2018 NDL Digital Library Cafe