Dr. Peter Bijcsy from NIST gave a talk entitled “To Measure or Not To Measure Terabyte-Sized Images” today in UMD CS Department.
Abstract and Bio
This talk will elaborate on a basic question “To Measure or Not To Measure Terabyte-Sized Images?” posed by William Shakespeare if he were a bench scientist at NIST. This basic question is a dilemma for many traditional scientists that operate imaging instruments capable of acquiring very large quantities of images. However, manual analyses of terabyte-sized images and insufficient software and computational hardware resources prevent scientists from making new discoveries, increasing statistical confidence of data-driven conclusions, and improving reproducibility of reported results.
The motivation for our work comes from experimental systems for imaging and analyzing human pluripotent stem cell cultures at the spatial and temporal coverages that lead to terabyte-sized image data. The objective of such an unprecedented cell study is to characterize specimens at high statistical significance in order to guide a repeatable growth of high quality stem cell colonies. To pursue this objective, multiple computer and computational science problems have to be overcome including image correction (flat-field, dark current and background), stitching, segmentation, tracking, re-projection, feature extraction, data-driven modeling and then representation of large images for interactive visualization and measurements in a web browser.
I will outline and demonstrate web-based solutions deployed at NIST that have enabled new insights in cell biology using TB-sized images. Interactive access to about 3TB of image and image feature data is available at https://isg.nist.gov/deepzoomweb/.
Bio: Peter Bajcsy received his Ph.D. in Electrical and Computer Engineering in 1997 from the University of Illinois at Urbana-Champaign (UIUC) and a M.S. in Electrical and Computer Engineering in 1994 from the University of Pennsylvania (UPENN). He worked for machine vision, government contracting, and research and educational institutions before joining the National Institute of Standards and Technology (NIST) in 2011. At NIST, he has been leading a project focusing on the application of computational science in biological metrology, and specifically stem cell characterization at very large scales. Peter’s area of research is large-scale image-based analyses and syntheses using mathematical, statistical and computational models while leveraging computer science fields such as image processing, machine learning, computer vision, and pattern recognition. He has co-authored more than more than 27 journal papers and eight books or book chapters, and close to 100 conference papers.
Links
- https://isg.nist.gov/deepzoomweb/
- https://github.com/NIST-ISG
- https://isg.nist.gov/deepzoomweb/activities
- Cell Segmentation Survey: https://isg.nist.gov/deepzoomweb/resources/survey/index.html
Questions
Thank you Dr. Bajcsy for your great talk and live demo. It’s very impressive. I have a question about rendering. I saw when recording the movie frame by frame, the image is not rendered in real-time. So I am wondering which factor limits the rendering rate? What is greatest challenge for rendering terabytes image?
Answer: Bandwidth!
Messy Note
Diabetes, heart disease, musculoskeletal disorders, or age-related macular degeneration are amongst today’s concerns.
To measure or not to measure the entire sample is a problem.
However, there are three fundamental problems: scale, complexity and speed.
Scale: Imaging Tsunami
- mXRF and XRD 1TB of data in 16 years
- now 1TB of data in 3 min
Complexity: Image Models.
Given 2TB of acquired image data in 2 minutes.
Move 2TB data from microscope to computer -> 66min over 1 Gbit/s bandwidth
Fundamental Problem: transform from images to scientific insights.
- Scale: Nano to centimeter physical scale; TB- to PB-sized digital datasets
- Complexity: Many instruments, Sample variety and Many models
- Speed: Validate models, Explore and Discover
Potential Applications:
- Astronomy
- Chemistry
- Medicine
- Fire Research
- Physics
- Biology
- Materials Science
- Forensic Science
Case Study
Age-Related Macular Degeneration (AMD). 11 million affected people in the US. Leading cause of vision loss in adults. Estimates of the global cost is $343 billion, including $255 billion in direct health care costs.
Stem cell engineering of retina is needed…
Safety of Carbon Nanotubes.
54 laboratory animal studies.
Carbon nanotubes can cause adverse pulmonary effects including inflammation, granulomas.
Challenges:
Scale: Nano to centimeter physical scale, TB- to PB-sized digital datasets
Challenge: Data spraying, fast processing, limited transfer.
Complexity: Many instruments, sample variety, many models.
Challenge: Multi-modal image fusion, image object characterization & modeling
Speed: Validation of models, exploration, discovery; Challenge: Comparison across models, rendering images, search over image feature space.
Astronomy: Sloan digital sky survey
Medicine: 2D histology slide
Earth Science: GIS visualization
Existing solutions are highly application specific and impractical to be adopted by new applications.
Current limits in Bio and Material Sciences
- data reside on hard drives, large sample variety and many imaging modalities
- No trusted tools to collect measurement.
Approach: Scientist’s Perspective
Sample, one small field of view = ~mega-pixel image
Create one large field of view per imaging modality ~ hundred giga-pixel image
Fusion + rapid analyses <=> auto analyses
Metrology Perspective
Parameters of Data Acquisition and many small image fields of view
Calibration and image fusion
hierarchical partitioning
phenomenon models at each level
K = \frac{1}{2}(mv^2)
Parametrized Models and algorithms
rapid and collaborative tools
parametrization, display, validation and optimization
How to do Hierarchical Partitioning
5mm ~115K pixels -> 1366×768 pixels
Herbert Simon, Nobel prize in Economic 1978 said that
Complexity frequently takes the form of hierarchy.
Scale approach: logical partitioning, parallel algorithms, selective transfer
Complexity: experimental design and fusion algorithm, object modeling
Speed: model hierarchies, local processing, collaborative measurements.
Approach: Building Blocks
Transfer many small FoV to one large FoV.
Corrections, Stitching, Re-Projection, Segmentation, Tracking, Feature Extraction, Prediction Modeling
Transform one large field of view for rapid analyses.
2.8MB/image = 0.677TB (17% sampled)
Image understanding
Accuracy, Uncertainty, Robustness and Sensitivity
Scalability of image computations
The answer is to measure the entire sample