The post [Talk Summary] HandSight: A Touch-Based Wearable System to Increase Information Accessibility for People with Visual Impairments appeared first on Fusing Data and AI into VR.

]]>

HandSight: A Touch-Based Wearable System to Increase Information Accessibility for People with Visual Impairments

Please refer to http://www.leestearns.com for more details. An incomplete outline is summarized below:

Related Work

- OrCam, Access Lens, Access Lens, OmniTouch, VizLens (UIST 2016), ForeSee, Google Glass, eSight, NuEyes, IrisVisio

Reading / Exploring Text

- Advantages of Touch-Based Reading
- Does not require framing an overhead camera
- Allows direct access to spatial information
- Provides better control over pace and rereading

- New Challenges
- How to precisely trace a line of text?
- How to support physical navigation?

HandSight HoloLens version

- Augment rather than replace
- Camera resolution too low
- Turning head to look at desired content was uncomforable
- Voice commands are cumbersome and imprecise
- Fixed 2D
- Screen billboards

- Fixed 3D
- Vertical and horizontal mode

- Finger tracking design
- Follow where the finger is pointing at in 3D

- Three participants

Findings

- Finger-worn camera
- [+] flexible, allows hands-free use
- [-] Requires moving finger to read

- HoloLens for low-vision
- Low contrast due to transparency
- Narrow view, center of vision

Handheld Camera / Mobile Phone

- 6 low vision participants
- participants were more successful and positive about their experience
- [+] Better camera
- [+] More useble interactions
- [-] No longer hands-free

Conclusion / Strengths and WEaknesses of 3D AR

- [+] Enables new interactions not possible with other approaches
- [+] Good for multitasking

Design space exploration: AR magnification & enhancement

Implementation and evaluation

On-body Input using finger-worn sensors

- Preprocessing
- Coarse-Grained Classification
- Textures
- Feature Extraction
- Localization (SVM)
- Accelerometer
- Gyroscope
- Magnetometer

- Fine-Grained Classification
- Geometric Verification

Within-person classification Experiment

- Coarse-Grained 99.1%, 88.0%, 96.4

Interface Designs

- Real-time processing (~60 fps)
- five applications, clock, health & acitivities, clock, daily summary, notifications, summary, location-specific gestures on body
- 12 Visually Impaired participants
- 5 participants location-independent gestures
- 6 participants love the location-specific gestures on the palm
- 1 participant love location-specific gestures on the body

- Mitigate camera framing issues
- Demonstrate feasibility, with high accuracy and approaches

Color Recognition

- Limitations: cannot recognize patters, only color
- Do not allow users to quickly inspect multiple locations
- Accuracy affected by ambient lighting and distance.
- Both colors and visual patterns
- Deep convolusional activation features (DeCAF)
- Dense SIFT features combined in an Improved Fisher Vector (IFV)
- Highly controlled dataset – risks overfitting, limits robustness
- Solid, striped, checkered, dotted, zigzag, textured, rotation (30\degree increments), scales (1-4)
- An End-to-End Deep Learning Approach
- Fine-tuning the classifier with ~half of the HandSight images, (N=36 per classs) increases the accuracy to 96.5%
- Identify multiple colors in a single image
- User-configurable level of detail:
- Two datasets of fabric pattern images
- 529 images from HandSight
- 77,052 external

Future Work

- Alternative or supplementary camera locations
- Camera on the user’s finger or wrist
- Camera on the User’s Upper Body
- Wider field of view more contextual information
- Easier to localize and track hand/finger position
- Spatial exploration of documents and other surfaces
- Maps, charts and graphs are hard to explain (A very interesting deep learning topic)
- Translating to other languages

The post [Talk Summary] HandSight: A Touch-Based Wearable System to Increase Information Accessibility for People with Visual Impairments appeared first on Fusing Data and AI into VR.

]]>The post [Talk Summary] A large-scale analysis of YouTube videos depicting everyday thermal camera use appeared first on Fusing Data and AI into VR.

]]>Both talk and slides are structured of the top quality. Please refer to http://www.cs.umd.edu/~mattm/#pubs for more details about Matt’s slides and papers.

Next is a brief outline.

Energy audits and thermometric surveying are time and labor intensive.

- Missing insulation
- Air leakage
- Moisture Intrusion

Solutions:

- Sealing air leaks
- Adding insulation
- Improving lighting
- Increasing efficiency of appliances

Research Thread I: YouTube Study

- Dataset Generation
- Keywords, search and expand keywords, filter and validate data, compile full dataset

- SMIDGen
- first 200 results, KLD (Kullback-Leibler Divergence and word occurrence)
- infrared, lepton, thermal, flir,

- Qualitative Coding
- Informal exploration
- Outdoor recreation
- Small Electronics
- Vehicles
- Emergency Applications
- Health and Wellness
- Research

- Can I thermal camera see through water?
- What types of misconception of thermal cameras?
- “Insulation does not cover all the way to the corner of the house.”
- 7.7% response rate
- 72% use thermal cameras to perform DIY
- 86% users audit single-family residential homes
- 17% just being curious
- 6% claims against landlords or contractors
- 60% reported investing in renovations or retrofits
- fewer engaged in concentration behaviors
- Research outcomes
- Characterization of common thermal camera use

- Novice Smartphone Field Study
- Professional Thermography Study
- Temporal Thermography Study I
- Temporal Thermography Study II
- Conclusion and Future Work

Study Design

- Pre-Study questionnaire
- Introduction Meeting in local cafe
- Hardware/Software Overview
- 4 Page Thermogrpahic Inspection Guide
- Mission: Investigate your home with your thermal camera for signs
- Survey: Weekly Questionnaire
- Semi-Structured Interview
- “It was pretty clear to me …”

- Post-study questionnaire
- We qualitatively coded the survey, interview, …

Thermography

- Home
- 572 photos, AVG = 57.2, SD = 52.27

- Workplace
- 405 photos, AVG 49.5, SD-18.02
- Sleep mode: unused for the weekend but still appeared hot

- Community

Field activities

- Indoor
- Walls
- Electronics
- Light Fixtures
- People/Pets
- Outdoor
- w=Windows
- Doors
- Ceilings
- Play/Experiments

Semi-Structured Interviews

- Knowledge Discovery
- Potential Benefits
- All participants considered the thermal camera a valuable investigative tool.
- Most participants suggested that supporting decisions.
- I have been meaning to contact my landloard….
- All participants described capturing imagery that they could comfortably interpret and imagery that they did not understand
- I don’t know how much it is related to …
- Locus of Control
- If I took a picture that showed the issue, … don’t have solution to fix…

How can we scale thermographic assessments?

- Energy / Auditing Backback
- Cars Mounted Thermography cameras
- No human perspective in automated thermogrphy literature

Thread II: UAVs

- Research Question:
- How is thermography currently being used by professional…

- Recruit participants using listserv
- 10 participants (1 Female)
- Semi-Structured Interviews
- Presentation of Design Probes
- Observational Case Study
- You are responsible for a small fleet of UAVs. The UAVs fly around semi-autonomously collecting thermal data, enable historical reports showing thermal performance over time
- Mid-Fi Prototype, 3D reconstruction, anamaly detection, thermal analysis
- We qualitatively coded the interview
- Required knowledge
- Client interactions
- Challenges
- Automatic anomaly detection
- Model generation
- Automated approaches lack control of environment
- Data overload: how to manage orders of magnitude more data?
- Energy auditing is a social process
- Observational Case Study
- An assessment of professional energy auditing and thermography’s role therein.
- A critical examination of emerging automated solutions

Thread III – Easy-to-deploy thermographic sensor kit

- GPS Unit and high capacity battery
- Thermal camera
- Motion sensor
- The energy auditor used the system to audit Hornbake library.
- Using the tool is easy if I know what I am going to look at
- Temporal data may make identifying transient environmental conditions easier to identify
- Revised visualization
- Thermal analysis
- Overnight 1, 2, Full Campaign
- Novice Study
- 5 participants
- There are some very cold spots in the office, but it’s hard to tell why
- Interactive reporting
- 4 of 5 were positive about receiving the easy-to-read, automatically generated report
- 4 of 5 liked having longitudinal data and the additional depth the report provided by comparison to thermograms alone.

- data privacy
- if it were just a local network I use in my house, it’s totally fine

- personal confidence
- It has made me more general repairs

- post-mission attitudes

- Professional Study
- Raising Awareness
- Providing Reliable Data
- Relationship Building
- Coverage and Installation
- Motivating Action

- Temporal analysis provides more specific insights in the case of insulation performance

The post [Talk Summary] A large-scale analysis of YouTube videos depicting everyday thermal camera use appeared first on Fusing Data and AI into VR.

]]>The post [Summary] Google I/O and Microsoft Build 2018 appeared first on Fusing Data and AI into VR.

]]>Google published email auto-completion, photo auto-spot, auto-colorization, better sound synthesis, memorable Q&A, Android P, text from images, style match.

Microsoft presented its Fluent Design, Azure AI-enabled edge devices (phone, drone, infrastructure, IoT), Microsoft 365, collaboration with Alexa, Visual Studio Live Share, HoloLens Remote Assist and Layout.

The post [Summary] Google I/O and Microsoft Build 2018 appeared first on Fusing Data and AI into VR.

]]>The post [Summary] StackGAN: Text to Photo-realistic Image Synthesis appeared first on Fusing Data and AI into VR.

]]>The StackGAN is the first to generate 256*256 image with photo-realistic details from text description.

Generative Adversarial Network (GAN), originally proposed by Ian. It takes advantage of a generator network and a discriminator. The generator is trained to fool the discriminator by improving the generated image.

The main difficulty for generating high-resolution images by GANs is support of natural image distribution and the support of the implied model distribution may not overlap in high dimensional pixel space.

In StackGAN, there are two stages of GAN procedure. The Stage-I generator draws a low-resolution image by sketching rough shape and basic colors of the object from the given text and painting the background from a random noise vector. Conditioned on Stage-I results, the Stage-II generator corrects defects and adds compelling details into Stage-I results, yielding a more realistic high-resolution image.

In Stage-I, stackGAN does not use the embedding space as condition, but apply a FC layer to obtain a normal distribution Z~N(0, 1), and use the samples to generate the condition. This is because, the dimensionality of the embedding space is usually much higher than the text. If we use the embedding space directly as the condition, the latent variable in the latent space will be sparse.

In the generator, instead of using Deconv, but several 3×3 conv

In the discriminator, it uses several conv with step=2, and combine with resized embedding space.

In Stage-II, it combined the downsampled samples form Stage-I, and augmented embedding (sampled from Gaussian) as input; with seeral residual blocks, use the same upsampling technique to obtain the pictures.

With more labels, generator can decomposite a complicated distribution into several simple distributions with low dimensionality.

It obtained the state-of-the-art inception score (IS), with 28.47% and 20.30% improvement. The Inception Score is a metric for automatically evaluating the quality of image generative models. [Salimans et al., 2016]. This metric was shown to correlate well with human scoring of the realism of generated images from the CIFAR-10 dataset. The IS uses an Inception v3 Network pre-trained on ImageNet and calculates a statistic of the network’s outputs when applied to generated images.

$IS(G) = \exp{ E_{x\approx p_g} D_{KL} (p(y|x) || p(y) }$

where $x~p_g$ indicates that x is an image sampled from p_g, D_{KL} (p(y|x) || p(y) is the KL-divergence between the distributions p and q, p(y|x) is the conditional class distribution, and p(y) is the marginal class distribution.

The authors who proposed the IS aimed to codify two desireable qualities of a generative model into a metric:

- The image generated should contain clear objects (i.e. the images are sharp rather than blurry), or P(y|x) should be low entropy. In other words, the inception network should be highly confident there is a single object in the image.
- The generative algorithm shouldd output a high diversity of images from all the different classes in ImageNet, or p(y) should be high entropy.

The code of StackGAN is release at Github here.

Conditional GAN is one of the earliest variants of GAN:

$\max_D { E_x~P_data log D(x|y) + E_{x~P_G} log(1-(D(x|y)) } $

where the condition could be pictures, labels

In pix2pix, it uses paired pictures as the condition.

In Cycle GAN and Disco GAN, without using the paired data, they transfer styles in different domains and sizes.

[1]Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).
[2] Isola, Phillip, et al. “Image-to-image translation with conditional adversarial networks.” arXiv preprint (2017).
[3]Zhu, Jun-Yan, et al. “Unpaired image-to-image translation using cycle-consistent adversarial networks.” arXiv preprint arXiv:1703.10593 (2017).
[4]Kim T, Cha M, Kim H, et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks[J]. 2017.

The post [Summary] StackGAN: Text to Photo-realistic Image Synthesis appeared first on Fusing Data and AI into VR.

]]>The post [Summary] Omnipresence 3D for Multiview Mixed Reality appeared first on Fusing Data and AI into VR.

]]>This is definitely a great leap of my prior work, VideoFields. At the time, I was offered only three surveillance video cameras and would like to create an immersive virtual environments using them. It works from the research point of view. And today, Fortem’s Omnipresence 3D software. makes a big move which seamlessly integrates 57 leading brands of security, automation and IT equipment, e.g. video cameras and recorders, access control, video analytics, GIS, GPS, radar, sonar, gunshot detection, etc. In addition, it alerts users of abnormal situations and provide the most relevant and actionable information needed to make timely and correct decisions, and features unique Immersive 3D technology that eliminates manual steps to increase productivity and manage incidents better and faster.

For more information, please visit: http://deep3d.ninja/portfolio/omnipresence-3d/

The post [Summary] Omnipresence 3D for Multiview Mixed Reality appeared first on Fusing Data and AI into VR.

]]>The post [Summary] PointNet, PointNet++, and PU-Net appeared first on Fusing Data and AI into VR.

]]>Instead of 3D convolution, PointNet directly consumes point clouds, which well respects the permutation invariance of points in the input.

A point cloud is an unordered set of vectors. Each point Pi is a vector of its (x, y, z) coordinate plus extra feature channels such as color, normal etc.

The proposed deep network outputs k scores for all the k candidate classes. Our model will output n × m scores for each of the n points and each of the m semantic subcategories.

Our input is a subset of points from an Euclidean space

• **Unordered**. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order. In other words, a network that consumes N 3D point sets needs to be invariant to N! permutations of the input set in data feeding order.

• **Interaction among points**. The points are from a space with a distance metric. It means that points are not isolated, and neighboring points form a meaningful subset. Therefore, the model needs to be able to capture local structures from nearby points, and the combinatorial interactions among local structures.

• **Invariance under transformations**. As a geometric object, the learned representation of the point set

should be invariant to certain transformations. For example, rotating and translating points all together should not modify the global point cloud category nor the segmentation of the points.

The max pooling layer works as a symmetric function to aggregate information from all the points.

The local and global information combination structure , and two joint alignment networks that align both input points and point features.

However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. We introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales.

The key idea is to learn multilevel features per point and expand the point set via a multibranch convolution unit implicitly in feature space. The expanded

feature is then split to a multitude of features, which are then reconstructed to an upsampled point set.

Many of the top papers here use multiview projection and solve the 3D point cloud classification in 2D space.

Algorithm | ModelNet40 Classification (Accuracy) |
ModelNet40 Retrieval (mAP) |
ModelNet10 Classification (Accuracy) |
ModelNet10 Retrieval (mAP) |
---|---|---|---|---|

SO-Net[34] | 93.4% | 95.7% | ||

Minto et al.[33] | 89.3% | 93.6% | ||

RotationNet[32] | 97.37% | 98.46% | ||

LonchaNet[31] | 94.37 | |||

Achlioptas et al. [30] | 84.5% | 95.4% | ||

PANORAMA-ENN [29] | 95.56% | 86.34% | 96.85% | 93.28% |

3D-A-Nets [28] | 90.5% | 80.1% | ||

Soltani et al. [27] | 82.10% | |||

Arvind et al. [26] | 86.50% | |||

LonchaNet [25] | 94.37% | |||

3DmFV-Net [24] | 91.6% | 95.2% | ||

Zanuttigh and Minto [23] | 87.8% | 91.5% | ||

Wang et al. [22] | 93.8% | |||

ECC [21] | 83.2% | 90.0% | ||

PANORAMA-NN [20] | 90.7% | 83.5% | 91.1% | 87.4% |

MVCNN-MultiRes [19] | 91.4% | |||

FPNN [18] | 88.4% | |||

PointNet[17] | 89.2% | |||

Klokov and Lempitsky[16] | 91.8% | 94.0% | ||

LightNet[15] | 88.93% | 93.94% | ||

Xu and Todorovic[14] | 81.26% | 88.00% | ||

Geometry Image [13] | 83.9% | 51.3% | 88.4% | 74.9% |

Set-convolution [11] | 90% | |||

PointNet [12] | 77.6% | |||

3D-GAN [10] | 83.3% | 91.0% | ||

VRN Ensemble [9] | 95.54% | 97.14% | ||

ORION [8] | 93.8% | |||

FusionNet [7] | 90.8% | 93.11% | ||

Pairwise [6] | 90.7% | 92.8% | ||

MVCNN [3] | 90.1% | 79.5% | ||

GIFT [5] | 83.10% | 81.94% | 92.35% | 91.12% |

VoxNet [2] | 83% | 92% | ||

DeepPano [4] | 77.63% | 76.81% | 85.45% | 84.18% |

3DShapeNets [1] | 77% | 49.2% | 83.5% | 68.3% |

The post [Summary] PointNet, PointNet++, and PU-Net appeared first on Fusing Data and AI into VR.

]]>The post Gradient, Circulation, Laplacian, Divergence, Jacobian, Hessian, and Trace appeared first on Fusing Data and AI into VR.

]]>- In mathematics, the
**gradient**is a multi-variable generalization of the derivative. While a derivative can be defined on functions of a single variable, for functions of several variables, the gradient takes its place. The gradient is a vector-valued function, as opposed to a derivative, which is scalar-valued. The gradient (or gradient vector field) of a scalar function*f*(*x*_{1},*x*_{2},*x*_{3},…*x*) is denoted ∇_{n}*f*or ∇→*f*where ∇ (the nabla symbol) denotes the vector differential operator, del. The notation grad*f*is also commonly used for the gradient. The gradient of*f*is defined as the unique vector field whose dot product with any unit vector**v**at each point*x*is the directional derivative of*f*along**v.** - In physical terms, the divergence of a three-dimensional vector field is the extent to which the vector field flow behaves like a source at a given point. It is a local measure of its “outgoingness” – the extent to which there is more of some quantity exiting an infinitesimal region of space than entering it. If the divergence is nonzero at some point then there is compression or expansion at that point. (Note that we are imagining the vector field to be like the velocity vector field of a fluid (in motion) when we use the terms
*flow*and so on.) Let*x*,*y*,*z*be a system of Cartesian coordinates in 3-dimensional Euclidean space, and let**i**,**j**,**k**be the corresponding basis of unit vectors. The divergence of a continuously differentiable vector field**F**=*U***i**+*V***j**+*W***k**is defined as the scalar-valued function - In mathematics and physics, a
**scalar field**associates a scalar value to every point in a space – possibly physical space. The scalar may either be a (dimensionless) mathematical number or a physical quantity. In a physical context, scalar fields are required to be independent of the choice of reference frame, meaning that any two observers using the same units will agree on the value of the scalar field at the same absolute point in space (or spacetime) regardless of their respective points of origin. Examples used in physics include the temperature distribution throughout space, the pressure distribution in a fluid, and spin-zero quantum fields, such as the Higgs field. These fields are the subject of scalar field theory. - In mathematics, the
**Laplace operator**or**Laplacian**is a differential operator given by the divergence of the gradient of a function on Euclidean space. It is usually denoted by the symbols ∇·∇, ∇^{2}, or Δ. The Laplacian Δ*f*(*p*) of a function*f*at a point*p*, up to a constant depending on the dimension, is the rate at which the average value of*f*over spheres centered at*p*deviates from*f*(*p*) as the radius of the sphere grows. In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems such as cylindrical and spherical coordinates, the Laplacian also has a useful form. - In linear algebra, the
**trace**of an*n*-by-*n*square matrix*A*is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of*A.*The trace is a linear mapping. - In mathematics, the
**field trace**is a particular function defined with respect to a finite field extension*L*/*K*, which is a*K*-linear map from*L*onto*K*. - In fluid dynamics,
**circulation**is the line integral around a closed curve of the velocity field. Circulation is normally denoted Γ (Greek uppercase gamma). Circulation was first used independently by Frederick Lanchester, Wilhelm Kutta, and Nikolai Zhukovsky. - In vector calculus, the
**Jacobian matrix**(/dʒəˈkoʊbiən/,^{[1]}^{[2]}^{[3]}/dʒɪ-, jɪ-/) is the matrix of all first-order partial derivatives of a vector-valued function. When the matrix is a square matrix, both the matrix and its determinant are referred to as the**Jacobian**in literature. The Jacobian generalizes the gradient of a scalar-valued function of multiple variables, which itself generalizes the derivative of a scalar-valued function of a single variable. In other words, the Jacobian for a scalar-valued multivariate function is the gradient and that of a scalar-valued function of single variable is simply its derivative. The Jacobian can also be thought of as describing the amount of “stretching”, “rotating” or “transforming” that a transformation imposes locally. For example, if (*x*′,*y*′) =**f**(*x*,*y*) is used to transform an image, the Jacobian**J**_{f}(*x*,*y*), describes how the image in the neighborhood of (*x*,*y*) is transformed. - In mathematics, the
**Hessian matrix**or**Hessian**is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term “functional determinants”. The Hessian matrix of a convex function is positive semi-definite. Refining this property allows us to test if a critical point*x*is a local maximum, local minimum, or a saddle point, as follows. If the Hessian is positive definite at*x*, then*f*attains an isolated local minimum at*x*. If the Hessian is negative definite at*x*, then*f*attains an isolated local maximum at*x*. If the Hessian has both positive and negative eigenvalues then*x*is a saddle point for*f*. Otherwise the test is inconclusive. This implies that, at a local minimum (respectively, a local maximum), the Hessian is positive-semi-definite (respectively, negative semi-definite).

The diagram I draw is inspired by Yun Wang. The Powerpoint of the source file is published here: TheMatrix

The post Gradient, Circulation, Laplacian, Divergence, Jacobian, Hessian, and Trace appeared first on Fusing Data and AI into VR.

]]>The post [Summary] Talk by Dr. Chakareski: Networked Virtual and Augmented Reality: The New Frontier appeared first on Fusing Data and AI into VR.

]]>Star Trek Teleport presents great potential to virtual human teleportation. We are motivated by applying super-human like vision to break bariers in remote sensing, monitoring, localization, navigation, scene understanding. It is well acknowledged that VR and AR applications are foundation of the 5G technology. However, from 2D passive sensing towareds 3D immersive interactive, it requires enormous bandwidth. In the era of Internet of Things (IoT), it is hyper data intensive – huge volume of data is required, especially for 360 degree video streaming.

// We envision real-time IoT sensing and UAV-enabled dynamic sensor placement. Projective geometry + distortion-rate theory + online Sensor Scheduling

Lagrange problem formulation:

V^*(s) = min [ c(h, y) + \lambda d(x, y) + \gamma \sum_{s’} p(s’ | s, y) V^*, \lambda (s’) ], \forall s

Post-decision State (PDS) Learning:

It captures system state after action takes place, but prior to unknown dynamics.

PDS value function

V^{*,\lambda} (x) = \min_{\alpha \in A} \left{ … \right}

PDS learning: update one state at a time => limit on convergence rate

Packet arrivals l_t and channel states h_t independent of PDS queue backlog x_t

=> use observation of (l_t, h_t) via PDS s_t = (x_t, h_t) to update all pDS s = (x, h_t)

There is no need to visit a state s to update it.

Q learning updates for state-action pair (x,y)

Qualitative Learning Advances, it updates (x, y).

However, post-decision state update for post-decision state x-y

virtual experience updates for post-decision states in a 3×3 grids.

In Viewport-Adaptive Navigable 360-Degree Video Delivery, the authors investigate the impact of various spherical-to-plane projections and quality arrangements on the video quality displayed to the user, showing that the cube map layout offers the best quality for the given bit-rate budget. An evaluation with a dataset of users navigating 360-degree videos demonstrates that **segments need to be short enough to enable frequent view switches**.

Virtual and augmented reality (VR/AR) have the potential to advance our society. Presently limited to offline operation and synthetic content, and targeting gaming and entertainment, they are expected to reach their potential when deployed online and with real remote scene content. This will require novel holistic solutions that will push the frontiers in sensing, compression, networking, and machine learning, to overcome the considerable challenges ahead. My long-term research objective is UAV-IoT-deployed ubiquitous VR/AR immersive communication that can enable **virtual human teleportation** to any corner of the world. Thereby, we can achieve a broad range of technological and societal advances that will enhance energy conservation, quality of life, and the global economy, as illustrated in Figure 1 below.

I am investigating **fundamental problems** at the intersection of signal acquisition and representation, communications and networking, (embedded) sensors and systems, and rigorous machine learning for stochastic control that arise in this context. I envision a future where UAV-IoT-deployed immersive communication systems **will help break existing barriers** in remote sensing, monitoring, localization and navigation, and scene understanding. The presentation will outline some of my present and envisioned investigations. Interdisciplinary applications will be highlighted.

The post [Summary] Talk by Dr. Chakareski: Networked Virtual and Augmented Reality: The New Frontier appeared first on Fusing Data and AI into VR.

]]>The post Estimated Cost of Per Atom Function in Real-time Shaders on the GPU appeared first on Fusing Data and AI into VR.

]]>This may not be accurate, but is mostly correct from my experience.

Some intuitions are:

- Abs, saturate are free (Why is clamp in GLSL not free? I doubt it)
**Log, exp, sqrt are almost free!**(That’s why Kernel Foveated Rendering is fast)- Sin, cos are super fast!
- smoothstep is more expensive than expected.
- I would suggest a cheap replace for Guassians:

float cubicPulse( float c, float w, float x ) { x = fabs(x - c); if( x>w ) return 0.0; x /= w; return 1.0 - x*x*(3.0-2.0*x); }

Here is the full grouped list:

- Cost 0 (Almost free)
- abs(x), saturate (x)

- Cost 1
- floor(x), ceil(x), round(x), frac(x), exp2(x), dot(a, b), min(a, b), max(a, b), sin(x), cos(x), sincos(x), sqrt(x), rsqrt(x)

- Cost 1.5
- faceforward(n, i, ng)

- Cost 2
- clamp(a, b), exp(x), log(x), log10(x), cross(a, b), step(a, x), lerp(a, b, f), length(v), distance(a, b)

- Cost 2.5
- reflect(i, n)

- Cost 3
- any(x), pow(x, y), sign(x), normalize(v),

- Cost 4
- all(x), fmod(4), mul(m, pos), transpose(M)

- Greater or equal to 5
- 7: smoothstep(min, max, x)
- 10: acos
- 11: asin
- 16: atan
- 22: atan2

One of my remaining question is:

**How fast is texture sampling on modern GPU?**- One option is to measure by Nvidia Perf https://developer.nvidia.com/nvidia-shaderperf
- I guess it’s 20

Any further experiments and feedbacks are welcome.

The post Estimated Cost of Per Atom Function in Real-time Shaders on the GPU appeared first on Fusing Data and AI into VR.

]]>The post Tianqi Chen published WebGL-based deep learning… appeared first on Fusing Data and AI into VR.

]]>This is fully possible given existing neural network code on ShaderToy…

But still, real-time deep learning at the browser side is clever, and ambitious… and they did it!

Paper: https://arxiv.org/pdf/1802.04799.pdf

Code: https://github.com/dmlc/nnvm/blob/master/tutorials/from_mxnet_to_webgl.py

The post Tianqi Chen published WebGL-based deep learning… appeared first on Fusing Data and AI into VR.

]]>