.
[Interview]
Utilization of medical big data that begins beyond the boundaries of individual doctors and hospitals
Interview with Masaru Kitsuregawa (Director, National Institute of Informatics / Professor, Institute of Industrial Science, University of Tokyo)
In November 2017, the "Medical Big Data Research Center" (hereinafter referred to as "Research Center") was newly established at the National Institute of Informatics (NII). The development of artificial intelligence (AI) utilizing medical big data has received great expectations in recent years for medical assistance, equalization of medical care, improvement of quality of medical care, and creation of evidence. Among them, medical images are expected because Japan has a larger number of image devices installed and the number of images taken compared to other countries, and the collection of high-quality image data with the correct diagnosis name is being promoted by academic societies. expensive. Therefore, in this paper, we asked Mr. Masaru Kitsuregawa, who has been a leader in big data research in Japan, about the prospects for its utilization.
--Please tell us the purpose of establishing the research center.
By utilizing cutting-edge information technologies such as the Kitsuregawa network, cloud computing, security, and AI, we will promote the solution of issues in the medical field.
Currently, we are working on two major businesses. One is the construction of a cloud platform that collects medical image big data ( Fig. ). The other is the development of an AI platform that analyzes a large amount of collected medical images and helps doctors make diagnoses. These are being promoted in collaboration with the three academic societies (Japan Gastroenterological Endoscopy Society, Japanese Society of Pathology, Japan Radiological Society) adopted by AMED "ICT Infrastructure Construction / Artificial Intelligence Implementation Research Project such as Clinical Research". In addition, we have begun discussions with the Japanese Ophthalmological Society.
Figure Medical image Conceptual diagram of big data cloud platform (click to enlarge) |
Each academic society collects the data accumulated in universities and hospitals, and anonymizes and unifies the format. The data is uploaded from each academic society server to the "Medical Image Big Data Cloud Platform" constructed by NII and saved. Allow researchers to analyze data in the cloud. |
--Research and development of medical diagnostic imaging using AI is already progressing overseas. What is the significance of working in Japan?
Data determines the performance of Kitsuregawa AI. A large amount of high quality image data is stored in Japan. By taking advantage of this, we aim to create high-performance image recognition AI. In addition, the tendency of illness varies from race to race, and the living environment also affects it. We believe that it is of great significance to conduct research and development using domestic data toward the construction of an early lesion detection system that matches the characteristics of Japanese people. Of course, the framework we build is global conscious.
--In a report issued by the Ministry of Health, Labor and Welfare's "AI Utilization Promotion Council in the Field of Health and Medical Care" in June 2017, diagnostic imaging support was mentioned as a priority area for advancing AI development. Japan has a high ability to develop diagnostic medical devices, and has the advantage of having a surplus in the trade balance of these devices.
I think that it is easy to find a way to switch to Kitsuregawa industry, but research and development are still underway.
Collect huge amounts of medical image data using Japan's fastest ultra-high-speed line
--Please tell us about the two projects that the Research Center is currently working on. First of all, what is cloud infrastructure construction?
Kitsuregawa Medical In order to utilize big data, of course, we first need a platform to collect and store data. To do this, you must be able to securely transfer and store your data. Furthermore, it is also required to be able to smoothly transfer and retrieve a huge amount of data at the time of utilization.
In this project, we will utilize SINET 5, an academic information network constructed and operated by NII. An ultra-high-speed line of 100 Gbps connects all over Japan, and currently more than 850 universities and research institutes. This line speed is equivalent to about 1000 times that of a home optical line.
――Do you need such a fast line to handle medical big data?
Kitsuregawa 100 Gbps is not something that is monopolized, but something that everyone uses. Especially these days, many disciplines tend to be data-intensive. Looking back on the history of ICT, it has evolved into richer media such as character strings, voice, images, and videos. Behind this was not only communication, but also the speeding up of processors and the development of large-capacity storage technology. As the ICT environment grows, new applications and devices that use that environment will be created.
Today's medical devices have become remarkably IT-enabled, and even now they are producing enormous amounts of data. As the history above shows, it is inevitable that more and more data will be generated in the future. Communication technologies exceeding 100 Gbps are also being created. For example, the government argued that the use of 8K images would start with medical care. Wide-area networks such as 8K transmission of surgical images and telemedicine seem to play a major role.
--There is also an analysis that the healthcare data that one person produces in a lifetime is 1 million gigabytes. When it comes to big data research, we tend to focus on the analysis of collected data, but it is also important to improve the environment.
Recording the activities of Kitsuregawa people as digital data is called Life Log. It is important to have a healthy life, but health is not the only goal of life. I believe that anonymized analysis of life data is the ultimate theme for humankind to achieve the Sustainable Development Goals (SDGs) advocated by the United Nations. It will be a huge amount of data, but the time when it can be challenged may not be so far. The dream of an "IT shop" will expand greatly.
Image data of about 120,000 cases will be registered.Further expansion from the next fiscal year
――Next, please tell us about the current status of medical image data collection.
Kitsuregawa In November 2017, the cloud infrastructure was set up and operational. We are aiming to start registering image data by the end of this year.
――How big are you planning to collect medical image data?
Kitsuregawa We have heard from academic societies that the target number of cases for image data registration during FY2005 is 10,000 cases for gastrointestinal endoscopy and 110,000 cases for pathology. We hope that each academic society will expand the scale of collection with the cooperation of more hospitals in the future.
――It is estimated that the performance of AI will reach the level that can be tolerated by learning about 5,000 supervised data, and that if you learn 10 million, it will be comparable to human ability.
The performance of Kitsuregawa AI depends on the quality and quantity of data. Medical image AI has been studied for a long time, but the small amount of data was the biggest issue. This time, we are collecting it on a scale that was difficult to obtain in the past by the academic society, so I am looking forward to seeing how much the performance will improve.
――How do you see the progress of this project and recent deep learning?
Kitsuregawa This time, the focus on images and the fact that AI analysis was taken up as a business in collaboration with academic societies is very timely and wonderful. Image recognition accuracy has been dramatically improved by deep learning technology that has been rapidly developed in recent years. Deep learning is applied to various areas, but overwhelmingly high performance is achieved in the image area compared to, for example, language processing.
-Is analysis all-purpose? There seems to be a problem.
Kitsuregawa There are various diseases, and each has various patterns. In addition to typical lesions, there are many cases that are difficult for doctors to distinguish. I have been discussing with doctors in this project for a long time, but it is not easy. It is still in its infancy, and the response to various diseases is yet to come. However, even in rare cases such as whether one doctor will see several times in a lifetime, if data from all over Japan can be aggregated, the analysis will be greatly improved. As with humans, the more you study AI, the smarter it becomes. The strength of big data is to bring out the value of the long tail.
--Are there any plans to work on research and development of diagnostic support systems other than images?
Kitsuregawa Medical has various big data, so there is a possibility. In fact, in my laboratory at the University of Tokyo, in a project of the Cabinet Office, we are constructing an analysis system for medical receipt data all over Japan in collaboration with the Medical Economics Research Organization. We have about 200 billion records here and are proceeding with analysis. Many facts that could not be felt until now are being discovered. I think that such big data is the only one in the world. There is a lot of valuable information in Japan.
Analytical AI, aiming for practical stage within a few years
――What is the current status of the construction of analysis AI, another project that the Research Center is working on?
We have just begun collecting and analyzing Kitsuregawa data. In the future, in cooperation with academic societies, we will strengthen the collection infrastructure and improve the recognition accuracy based on the collected image data step by step. With the goal of within two to three years, I would like to reach a level where doctors can say, "This will be useful."
--The Ministry of Health, Labor and Welfare's "Process Chart for the Utilization of AI" also states that the goal of building an image database is 2020.
Kitsuregawa As mentioned at the beginning, this project is all about data. Currently, each academic society is making great efforts in collecting data. We are deepening comprehensive discussions with academic societies, including target roadmap design, on what kind of data to collect.
――What is the significance of NII's participation in this project?
Kirengawa NII is a rare research institute in the world that conducts IT research and IT service operations at the same time. In this era, there are many research institutes that are researching IT "also", but most of them are researches that utilize IT in the original research field. IT is subdivided into more than 40 areas today. AI is just one of them. For example, considering the next-generation "IT hospital", not only image processing technology, but also language processing technology for reading chart character strings, human machine interface technology for presenting information to users, software engineering technology for creating AI, A variety of IT infrastructures are required, including voice dialogue technology that listens to patient complaints well. Because NII is the only national research institute that comprehensively researches IT "only" in Japan, it covers almost all areas from the basics to applications of IT, and can integrate the latest knowledge in each area. Again, I think that NII's comprehensive research capabilities for IT as a whole are the reason why we were asked to participate.
――It seems that not only NII but also many universities are participating in the research.
Kitsuregawa The biggest feature of this project is that it is all-Japan. Not only NII, but also teachers from the University of Tokyo, Nagoya University, Kyushu University, etc., who have experience in developing medical diagnostic imaging AI, are participating, and we would like to expand further in the future. NII is an inter-university research institute. We are building a system aiming for an open research environment. I have heard from academic societies that I expect the participation of as many hospitals as possible in the future, so I feel that it is necessary for the IT side to design a place that can widely invite the wisdom of Japan. I would like to create an environment where diverse researchers can sometimes bring in different methodologies, seek better methods, and try various things.
--What are the current performance of AI and what are your plans for future utilization after the completion of AI?
Kitsuregawa I can't say exactly because it hasn't been announced yet, but the performance value of the recognition rate is quite good. We have not yet set a specific plan for its utilization, but we plan to discuss it with AMED and other academic societies that are supporting us.
The fun of medical big data
――Finally, please tell us about the fun of medical big data from the perspective of Dr. Kitsuregawa, an informatics expert.
Kirengawa We have been conducting research on big data in various fields such as mobility, telecommunications, finance, media, environment, disasters, and policies, but the most difficult and rewarding one is humans. Data analysis. Not only medical care, but also education, for example. I hear that in the United States, analyzing the degree of learning has resulted in a significant reduction in dropouts. My laboratory at the University of Tokyo operates a big data analysis platform for the global environment of 25 petabytes. There is a lot of big data, but I dream that in the end, "human-based integration" will be achieved. Medical data on human health is one of the most important themes.
--Professor Kitsuregawa's research group at the University of Tokyo is building an analysis system that integrates healthcare big data for medical care, long-term care, and medical examinations. I hear that you will soon work with Nabari City, Mie Prefecture to formulate specific measures with the aim of realizing comprehensive community care based on evidence.
Kitsuregawa medical data can be used in many ways. The attempt in Nabari City is aimed at utilizing it for the government. For example, by analyzing which medical institution the inhabitants go to, it becomes possible to quantitatively understand whether the medical service is satisfactory and how much the inhabitants are forced to go to a distant medical institution. In addition, by analyzing a large amount of data that links medical care and long-term care, it is becoming clear that the introduction of terminal care will contribute to the reduction of medical expenses. Specific uses are also progressing, such as a system for analyzing changes in the health status of the late-stage elderly and examining the medical burden of local governments. Increasing medical costs is a major issue, and we believe that evidence-based measures using data will become more important in the future.
--Are there any technical difficulties?
Kitsuregawa This is a problem common to all fields, but the development of terms and ontology is actually the most troublesome problem.
――I hear that it is necessary to cleanse expressions and typographical errors when learning from AI. Is it important to promote standardization to reduce costs?
Kitsuregawa Of course, standardization is important. However, whether it is de-journal or de facto, it generally takes a considerable amount of time. In addition, technological progress is extremely rapid these days, and new expressions are being created one after another. In the first place, the terms cannot be organized at the beginning of the technology. So this is an endless battle. Many people say that the data is dirty if the notation is irregular. At that time, I say, "I have never seen clean data in my life. If there is clean data, please show it." You need to recognize that what you can do in the irregularity and how you can reduce noise is the showcase of your skill.
--The revised Personal Information Protection Law and the Next Generation Medical Infrastructure Law have clarified the standards for handling data. I hope that the research will be easier to proceed.
I think it is epoch-making that the opt-out use of Kitsuregawa medical data has become possible. I think that fine-tuned system adjustments will be made in the future, but I hope that it will be a framework that can feed back the benefits to the patients who provide the data. We hope that IT-based high-precision diagnostic imaging support can contribute to this.
--thank you.
(End)
Masaru Kitsuregawa Graduated from the Department of Electronic Engineering, Faculty of Engineering, University of Tokyo in 1978, and completed the doctoral course in the Department of Information Engineering, Graduate School of Engineering, University of Tokyo in 1983 (Doctor of Engineering). In 2003, he was the director of the Center for Strategic Information Fusion, Institute of Industrial Science, the same university, in 2008, the director of the Ministry of Education, Culture, Sports, Science and Technology, in 2010, the director of the Institute for Collaborative Research on Earth Observation Data, and in 2013, the director of the National Institute of Informatics. He has received numerous awards such as the 2009 ACM SIGMOD EF Codd Innovations Award, 2012 IEEE Fellow, ACM Fellow, 2013 Purple Ribbon Medal, 2015 21st Century Invention Award, and 2016 Region Donour Medal Chevalier. He developed a high-speed database engine with an out-of-order execution method under the Cabinet Office's Advanced Research and Development Support Program (FIRST). A leading researcher in big data utilization research that has led national projects such as the Ministry of Education, Culture, Sports, Science and Technology "Information Explosion Project" and the Ministry of Economy, Trade and Industry "Information Grand Voyage". His life work is "Yoshimoto Engineering (learning laughter)".