The post [Summary] Google I/O and Microsoft Build 2018 appeared first on Fusing Data and AI into VR.

]]>Google published email auto-completion, photo auto-spot, auto-colorization, better sound synthesis, memorable Q&A, Android P, text from images, style match.

Microsoft presented its Fluent Design, Azure AI-enabled edge devices (phone, drone, infrastructure, IoT), Microsoft 365, collaboration with Alexa, Visual Studio Live Share, HoloLens Remote Assist and Layout.

The post [Summary] Google I/O and Microsoft Build 2018 appeared first on Fusing Data and AI into VR.

]]>The post [Summary] StackGAN: Text to Photo-realistic Image Synthesis appeared first on Fusing Data and AI into VR.

]]>The StackGAN is the first to generate 256*256 image with photo-realistic details from text description.

Generative Adversarial Network (GAN), originally proposed by Ian. It takes advantage of a generator network and a discriminator. The generator is trained to fool the discriminator by improving the generated image.

The main difficulty for generating high-resolution images by GANs is support of natural image distribution and the support of the implied model distribution may not overlap in high dimensional pixel space.

In StackGAN, there are two stages of GAN procedure. The Stage-I generator draws a low-resolution image by sketching rough shape and basic colors of the object from the given text and painting the background from a random noise vector. Conditioned on Stage-I results, the Stage-II generator corrects defects and adds compelling details into Stage-I results, yielding a more realistic high-resolution image.

In Stage-I, stackGAN does not use the embedding space as condition, but apply a FC layer to obtain a normal distribution Z~N(0, 1), and use the samples to generate the condition. This is because, the dimensionality of the embedding space is usually much higher than the text. If we use the embedding space directly as the condition, the latent variable in the latent space will be sparse.

In the generator, instead of using Deconv, but several 3×3 conv

In the discriminator, it uses several conv with step=2, and combine with resized embedding space.

In Stage-II, it combined the downsampled samples form Stage-I, and augmented embedding (sampled from Gaussian) as input; with seeral residual blocks, use the same upsampling technique to obtain the pictures.

With more labels, generator can decomposite a complicated distribution into several simple distributions with low dimensionality.

It obtained the state-of-the-art inception score (IS), with 28.47% and 20.30% improvement. The Inception Score is a metric for automatically evaluating the quality of image generative models. [Salimans et al., 2016]. This metric was shown to correlate well with human scoring of the realism of generated images from the CIFAR-10 dataset. The IS uses an Inception v3 Network pre-trained on ImageNet and calculates a statistic of the network’s outputs when applied to generated images.

$IS(G) = \exp{ E_{x\approx p_g} D_{KL} (p(y|x) || p(y) }$

where $x~p_g$ indicates that x is an image sampled from p_g, D_{KL} (p(y|x) || p(y) is the KL-divergence between the distributions p and q, p(y|x) is the conditional class distribution, and p(y) is the marginal class distribution.

The authors who proposed the IS aimed to codify two desireable qualities of a generative model into a metric:

- The image generated should contain clear objects (i.e. the images are sharp rather than blurry), or P(y|x) should be low entropy. In other words, the inception network should be highly confident there is a single object in the image.
- The generative algorithm shouldd output a high diversity of images from all the different classes in ImageNet, or p(y) should be high entropy.

The code of StackGAN is release at Github here.

Conditional GAN is one of the earliest variants of GAN:

$\max_D { E_x~P_data log D(x|y) + E_{x~P_G} log(1-(D(x|y)) } $

where the condition could be pictures, labels

In pix2pix, it uses paired pictures as the condition.

In Cycle GAN and Disco GAN, without using the paired data, they transfer styles in different domains and sizes.

[1]Mirza, Mehdi, and Simon Osindero. “Conditional generative adversarial nets.” arXiv preprint arXiv:1411.1784 (2014).
[2] Isola, Phillip, et al. “Image-to-image translation with conditional adversarial networks.” arXiv preprint (2017).
[3]Zhu, Jun-Yan, et al. “Unpaired image-to-image translation using cycle-consistent adversarial networks.” arXiv preprint arXiv:1703.10593 (2017).
[4]Kim T, Cha M, Kim H, et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks[J]. 2017.

The post [Summary] StackGAN: Text to Photo-realistic Image Synthesis appeared first on Fusing Data and AI into VR.

]]>The post [Summary] Omnipresence 3D for Multiview Mixed Reality appeared first on Fusing Data and AI into VR.

]]>This is definitely a great leap of my prior work, VideoFields. At the time, I was offered only three surveillance video cameras and would like to create an immersive virtual environments using them. It works from the research point of view. And today, Fortem’s Omnipresence 3D software. makes a big move which seamlessly integrates 57 leading brands of security, automation and IT equipment, e.g. video cameras and recorders, access control, video analytics, GIS, GPS, radar, sonar, gunshot detection, etc. In addition, it alerts users of abnormal situations and provide the most relevant and actionable information needed to make timely and correct decisions, and features unique Immersive 3D technology that eliminates manual steps to increase productivity and manage incidents better and faster.

For more information, please visit: http://deep3d.ninja/portfolio/omnipresence-3d/

The post [Summary] Omnipresence 3D for Multiview Mixed Reality appeared first on Fusing Data and AI into VR.

]]>The post [Summary] PointNet, PointNet++, and PU-Net appeared first on Fusing Data and AI into VR.

]]>Instead of 3D convolution, PointNet directly consumes point clouds, which well respects the permutation invariance of points in the input.

A point cloud is an unordered set of vectors. Each point Pi is a vector of its (x, y, z) coordinate plus extra feature channels such as color, normal etc.

The proposed deep network outputs k scores for all the k candidate classes. Our model will output n × m scores for each of the n points and each of the m semantic subcategories.

Our input is a subset of points from an Euclidean space

• **Unordered**. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order. In other words, a network that consumes N 3D point sets needs to be invariant to N! permutations of the input set in data feeding order.

• **Interaction among points**. The points are from a space with a distance metric. It means that points are not isolated, and neighboring points form a meaningful subset. Therefore, the model needs to be able to capture local structures from nearby points, and the combinatorial interactions among local structures.

• **Invariance under transformations**. As a geometric object, the learned representation of the point set

should be invariant to certain transformations. For example, rotating and translating points all together should not modify the global point cloud category nor the segmentation of the points.

The max pooling layer works as a symmetric function to aggregate information from all the points.

The local and global information combination structure , and two joint alignment networks that align both input points and point features.

However, by design PointNet does not capture local structures induced by the metric space points live in, limiting its ability to recognize fine-grained patterns and generalizability to complex scenes. We introduce a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set. By exploiting metric space distances, our network is able to learn local features with increasing contextual scales.

The key idea is to learn multilevel features per point and expand the point set via a multibranch convolution unit implicitly in feature space. The expanded

feature is then split to a multitude of features, which are then reconstructed to an upsampled point set.

Many of the top papers here use multiview projection and solve the 3D point cloud classification in 2D space.

Algorithm | ModelNet40 Classification (Accuracy) |
ModelNet40 Retrieval (mAP) |
ModelNet10 Classification (Accuracy) |
ModelNet10 Retrieval (mAP) |
---|---|---|---|---|

SO-Net[34] | 93.4% | 95.7% | ||

Minto et al.[33] | 89.3% | 93.6% | ||

RotationNet[32] | 97.37% | 98.46% | ||

LonchaNet[31] | 94.37 | |||

Achlioptas et al. [30] | 84.5% | 95.4% | ||

PANORAMA-ENN [29] | 95.56% | 86.34% | 96.85% | 93.28% |

3D-A-Nets [28] | 90.5% | 80.1% | ||

Soltani et al. [27] | 82.10% | |||

Arvind et al. [26] | 86.50% | |||

LonchaNet [25] | 94.37% | |||

3DmFV-Net [24] | 91.6% | 95.2% | ||

Zanuttigh and Minto [23] | 87.8% | 91.5% | ||

Wang et al. [22] | 93.8% | |||

ECC [21] | 83.2% | 90.0% | ||

PANORAMA-NN [20] | 90.7% | 83.5% | 91.1% | 87.4% |

MVCNN-MultiRes [19] | 91.4% | |||

FPNN [18] | 88.4% | |||

PointNet[17] | 89.2% | |||

Klokov and Lempitsky[16] | 91.8% | 94.0% | ||

LightNet[15] | 88.93% | 93.94% | ||

Xu and Todorovic[14] | 81.26% | 88.00% | ||

Geometry Image [13] | 83.9% | 51.3% | 88.4% | 74.9% |

Set-convolution [11] | 90% | |||

PointNet [12] | 77.6% | |||

3D-GAN [10] | 83.3% | 91.0% | ||

VRN Ensemble [9] | 95.54% | 97.14% | ||

ORION [8] | 93.8% | |||

FusionNet [7] | 90.8% | 93.11% | ||

Pairwise [6] | 90.7% | 92.8% | ||

MVCNN [3] | 90.1% | 79.5% | ||

GIFT [5] | 83.10% | 81.94% | 92.35% | 91.12% |

VoxNet [2] | 83% | 92% | ||

DeepPano [4] | 77.63% | 76.81% | 85.45% | 84.18% |

3DShapeNets [1] | 77% | 49.2% | 83.5% | 68.3% |

The post [Summary] PointNet, PointNet++, and PU-Net appeared first on Fusing Data and AI into VR.

]]>The post Gradient, Circulation, Laplacian, Divergence, Jacobian, Hessian, and Trace appeared first on Fusing Data and AI into VR.

]]>- In mathematics, the
**gradient**is a multi-variable generalization of the derivative. While a derivative can be defined on functions of a single variable, for functions of several variables, the gradient takes its place. The gradient is a vector-valued function, as opposed to a derivative, which is scalar-valued. The gradient (or gradient vector field) of a scalar function*f*(*x*_{1},*x*_{2},*x*_{3},…*x*) is denoted ∇_{n}*f*or ∇→*f*where ∇ (the nabla symbol) denotes the vector differential operator, del. The notation grad*f*is also commonly used for the gradient. The gradient of*f*is defined as the unique vector field whose dot product with any unit vector**v**at each point*x*is the directional derivative of*f*along**v.** - In physical terms, the divergence of a three-dimensional vector field is the extent to which the vector field flow behaves like a source at a given point. It is a local measure of its “outgoingness” – the extent to which there is more of some quantity exiting an infinitesimal region of space than entering it. If the divergence is nonzero at some point then there is compression or expansion at that point. (Note that we are imagining the vector field to be like the velocity vector field of a fluid (in motion) when we use the terms
*flow*and so on.) Let*x*,*y*,*z*be a system of Cartesian coordinates in 3-dimensional Euclidean space, and let**i**,**j**,**k**be the corresponding basis of unit vectors. The divergence of a continuously differentiable vector field**F**=*U***i**+*V***j**+*W***k**is defined as the scalar-valued function - In mathematics and physics, a
**scalar field**associates a scalar value to every point in a space – possibly physical space. The scalar may either be a (dimensionless) mathematical number or a physical quantity. In a physical context, scalar fields are required to be independent of the choice of reference frame, meaning that any two observers using the same units will agree on the value of the scalar field at the same absolute point in space (or spacetime) regardless of their respective points of origin. Examples used in physics include the temperature distribution throughout space, the pressure distribution in a fluid, and spin-zero quantum fields, such as the Higgs field. These fields are the subject of scalar field theory. - In mathematics, the
**Laplace operator**or**Laplacian**is a differential operator given by the divergence of the gradient of a function on Euclidean space. It is usually denoted by the symbols ∇·∇, ∇^{2}, or Δ. The Laplacian Δ*f*(*p*) of a function*f*at a point*p*, up to a constant depending on the dimension, is the rate at which the average value of*f*over spheres centered at*p*deviates from*f*(*p*) as the radius of the sphere grows. In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems such as cylindrical and spherical coordinates, the Laplacian also has a useful form. - In linear algebra, the
**trace**of an*n*-by-*n*square matrix*A*is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of*A.*The trace is a linear mapping. - In mathematics, the
**field trace**is a particular function defined with respect to a finite field extension*L*/*K*, which is a*K*-linear map from*L*onto*K*. - In fluid dynamics,
**circulation**is the line integral around a closed curve of the velocity field. Circulation is normally denoted Γ (Greek uppercase gamma). Circulation was first used independently by Frederick Lanchester, Wilhelm Kutta, and Nikolai Zhukovsky. - In vector calculus, the
**Jacobian matrix**(/dʒəˈkoʊbiən/,^{[1]}^{[2]}^{[3]}/dʒɪ-, jɪ-/) is the matrix of all first-order partial derivatives of a vector-valued function. When the matrix is a square matrix, both the matrix and its determinant are referred to as the**Jacobian**in literature. The Jacobian generalizes the gradient of a scalar-valued function of multiple variables, which itself generalizes the derivative of a scalar-valued function of a single variable. In other words, the Jacobian for a scalar-valued multivariate function is the gradient and that of a scalar-valued function of single variable is simply its derivative. The Jacobian can also be thought of as describing the amount of “stretching”, “rotating” or “transforming” that a transformation imposes locally. For example, if (*x*′,*y*′) =**f**(*x*,*y*) is used to transform an image, the Jacobian**J**_{f}(*x*,*y*), describes how the image in the neighborhood of (*x*,*y*) is transformed. - In mathematics, the
**Hessian matrix**or**Hessian**is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term “functional determinants”. The Hessian matrix of a convex function is positive semi-definite. Refining this property allows us to test if a critical point*x*is a local maximum, local minimum, or a saddle point, as follows. If the Hessian is positive definite at*x*, then*f*attains an isolated local minimum at*x*. If the Hessian is negative definite at*x*, then*f*attains an isolated local maximum at*x*. If the Hessian has both positive and negative eigenvalues then*x*is a saddle point for*f*. Otherwise the test is inconclusive. This implies that, at a local minimum (respectively, a local maximum), the Hessian is positive-semi-definite (respectively, negative semi-definite).

The diagram I draw is inspired by Yun Wang. The Powerpoint of the source file is published here: TheMatrix

The post Gradient, Circulation, Laplacian, Divergence, Jacobian, Hessian, and Trace appeared first on Fusing Data and AI into VR.

]]>The post [Summary] Talk by Dr. Chakareski: Networked Virtual and Augmented Reality: The New Frontier appeared first on Fusing Data and AI into VR.

]]>Star Trek Teleport presents great potential to virtual human teleportation. We are motivated by applying super-human like vision to break bariers in remote sensing, monitoring, localization, navigation, scene understanding. It is well acknowledged that VR and AR applications are foundation of the 5G technology. However, from 2D passive sensing towareds 3D immersive interactive, it requires enormous bandwidth. In the era of Internet of Things (IoT), it is hyper data intensive – huge volume of data is required, especially for 360 degree video streaming.

// We envision real-time IoT sensing and UAV-enabled dynamic sensor placement. Projective geometry + distortion-rate theory + online Sensor Scheduling

Lagrange problem formulation:

V^*(s) = min [ c(h, y) + \lambda d(x, y) + \gamma \sum_{s’} p(s’ | s, y) V^*, \lambda (s’) ], \forall s

Post-decision State (PDS) Learning:

It captures system state after action takes place, but prior to unknown dynamics.

PDS value function

V^{*,\lambda} (x) = \min_{\alpha \in A} \left{ … \right}

PDS learning: update one state at a time => limit on convergence rate

Packet arrivals l_t and channel states h_t independent of PDS queue backlog x_t

=> use observation of (l_t, h_t) via PDS s_t = (x_t, h_t) to update all pDS s = (x, h_t)

There is no need to visit a state s to update it.

Q learning updates for state-action pair (x,y)

Qualitative Learning Advances, it updates (x, y).

However, post-decision state update for post-decision state x-y

virtual experience updates for post-decision states in a 3×3 grids.

In Viewport-Adaptive Navigable 360-Degree Video Delivery, the authors investigate the impact of various spherical-to-plane projections and quality arrangements on the video quality displayed to the user, showing that the cube map layout offers the best quality for the given bit-rate budget. An evaluation with a dataset of users navigating 360-degree videos demonstrates that **segments need to be short enough to enable frequent view switches**.

Virtual and augmented reality (VR/AR) have the potential to advance our society. Presently limited to offline operation and synthetic content, and targeting gaming and entertainment, they are expected to reach their potential when deployed online and with real remote scene content. This will require novel holistic solutions that will push the frontiers in sensing, compression, networking, and machine learning, to overcome the considerable challenges ahead. My long-term research objective is UAV-IoT-deployed ubiquitous VR/AR immersive communication that can enable **virtual human teleportation** to any corner of the world. Thereby, we can achieve a broad range of technological and societal advances that will enhance energy conservation, quality of life, and the global economy, as illustrated in Figure 1 below.

I am investigating **fundamental problems** at the intersection of signal acquisition and representation, communications and networking, (embedded) sensors and systems, and rigorous machine learning for stochastic control that arise in this context. I envision a future where UAV-IoT-deployed immersive communication systems **will help break existing barriers** in remote sensing, monitoring, localization and navigation, and scene understanding. The presentation will outline some of my present and envisioned investigations. Interdisciplinary applications will be highlighted.

The post [Summary] Talk by Dr. Chakareski: Networked Virtual and Augmented Reality: The New Frontier appeared first on Fusing Data and AI into VR.

]]>The post Estimated Cost of Per Atom Function in Real-time Shaders on the GPU appeared first on Fusing Data and AI into VR.

]]>This may not be accurate, but is mostly correct from my experience.

Some intuitions are:

- Abs, saturate are free (Why is clamp in GLSL not free? I doubt it)
**Log, exp, sqrt are almost free!**(That’s why Kernel Foveated Rendering is fast)- Sin, cos are super fast!
- smoothstep is more expensive than expected.
- I would suggest a cheap replace for Guassians:

float cubicPulse( float c, float w, float x ) { x = fabs(x - c); if( x>w ) return 0.0; x /= w; return 1.0 - x*x*(3.0-2.0*x); }

Here is the full grouped list:

- Cost 0 (Almost free)
- abs(x), saturate (x)

- Cost 1
- floor(x), ceil(x), round(x), frac(x), exp2(x), dot(a, b), min(a, b), max(a, b), sin(x), cos(x), sincos(x), sqrt(x), rsqrt(x)

- Cost 1.5
- faceforward(n, i, ng)

- Cost 2
- clamp(a, b), exp(x), log(x), log10(x), cross(a, b), step(a, x), lerp(a, b, f), length(v), distance(a, b)

- Cost 2.5
- reflect(i, n)

- Cost 3
- any(x), pow(x, y), sign(x), normalize(v),

- Cost 4
- all(x), fmod(4), mul(m, pos), transpose(M)

- Greater or equal to 5
- 7: smoothstep(min, max, x)
- 10: acos
- 11: asin
- 16: atan
- 22: atan2

One of my remaining question is:

**How fast is texture sampling on modern GPU?**- One option is to measure by Nvidia Perf https://developer.nvidia.com/nvidia-shaderperf
- I guess it’s 20

Any further experiments and feedbacks are welcome.

The post Estimated Cost of Per Atom Function in Real-time Shaders on the GPU appeared first on Fusing Data and AI into VR.

]]>The post Tianqi Chen published WebGL-based deep learning… appeared first on Fusing Data and AI into VR.

]]>This is fully possible given existing neural network code on ShaderToy…

But still, real-time deep learning at the browser side is clever, and ambitious… and they did it!

Paper: https://arxiv.org/pdf/1802.04799.pdf

Code: https://github.com/dmlc/nnvm/blob/master/tutorials/from_mxnet_to_webgl.py

The post Tianqi Chen published WebGL-based deep learning… appeared first on Fusing Data and AI into VR.

]]>The post The Web Version of My AI Chatbot – LJAI appeared first on Fusing Data and AI into VR.

]]>What makes LJAI different is the virtual and real mapping of our own cat. Recently, we have acquired over 120 users, and thousands of chat messages.

http://veritayuan.com/lingjiang

- Compared with the WeChat Version, this one only supports text message for now, and requires a Google authentication.
- It’s responsive on mobile devices and PC.

The chatbot now supports arbitrary chatting, face recognition (age, gender, score), idioms, riddles, twisters, map location search, zodiac, naming, English-Chinese dictionaries, and so on.

To chat with the WeChat version, please scan the following QR codes below:

The repository is now hosted on Github as a private repository. We will publish it when it’s ready. Here are some pieces of code which might be of interest to some of my readers.

re_url = re.compile(u'(http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)') re_punctuation = re.compile(u"[\s+\.\!\/_,$%^*(+\"\']+|[+——！，。？、~@#￥%……&*（）]+") re_points24 = re.compile("([+-]*\d+)[^\d]+([+-]*\d+)[^\d]+([+-]*\d+)[^\d]+([+-]*\d+)") re_braces = re.compile(u"[\(\)（）]") re_phone = re.compile(r''' # don't match beginning of string, number can start anywhere (\d{3}) # area code is 3 digits (e.g. '800') \D* # optional separator is any number of non-digits (\d{3}) # trunk is 3 digits (e.g. '555') \D* # optional separator (\d{4}) # rest of number is 4 digits (e.g. '1212') \D* # optional separator (\d*) # extension is optional and can be any number of digits $ # end of string ''', re.VERBOSE) re_english = re.compile(u'([a-z\sA-Z&@0-9\'\"\.]+)') re_home = re.compile(u'([^\d]+)(\d+[\.。]?\d*)[$刀￥]?([^\d]*)') re_color_hex = re.compile(u"#[a-fA-F\d]{6}") states = [u"AL", u"AK", u"AZ", u"AR", u"CA", u"CO", u"CT", u"DC", u"DE", u"FL", u"GA", u"HI", u"ID", u"IL", u"IN", u"IA", u"KS", u"KY", u"LA", u"ME", u"MD", u"MA", u"MI", u"MN", u"MS", u"MO", u"MT", u"NE", u"NV", u"NH", u"NJ", u"NM", u"NY", u"NC", u"ND", u"OH", u"OK", u"OR", u"PA", u"RI", u"SC", u"SD", u"TN", u"TX", u"UT", u"VT", u"VA", u"WA", u"WV", u"WI", u"WY"] states_reg_str = u"(Maryland" for s in states: states_reg_str += u"|\,\s*" + s states_reg_str += u"|" + s + u"\s*\d{5}" states_reg_str += u")" # print states_reg_str re_states = re.compile(states_reg_str) re_gender1 = re.compile(u"男.*女") re_gender2 = re.compile(u"女.*男") re_zodiac = re.compile(u"(\d{1,2})[^\d]+(\d{1,2}).*星座") re_moe = re.compile(u"")

The post The Web Version of My AI Chatbot – LJAI appeared first on Fusing Data and AI into VR.

]]>The post Exploration of the IEEE-754 Floating Point Standard appeared first on Fusing Data and AI into VR.

]]>c = zeros(3,1); x = 1; while 1 + x > 1, x = x / 2, c(1) = c(1) + 1, end %53 x = 1; while x + x > x, x = 2 * x, c(2) = c(2) + 1, end %1024 x = 1; while x + x > x, x = x / 2, c(3) = c(3) + 1, end %1075

The counter is ** 53, 1024, 1075** for the above three for loops. The code is attatched in

- According to IEEE-754 (double precision), the machine epsilon is $2^{-52}$. So numbers in the range might be rounded to . Since the while loop counter increases 1 at =1, the counter is 53 finally.
- According to IEEE-754, the exponent e satisfies , and plus one for the first loop, the counter goes to 1024 and stops. With overflow at the last loop, x would be inf.
- Initially, I suppose it to be 1023 because the realmin in Matlab is . However, Matlab proves me to be wrong. After reading Reference 1 by Cleve Moler, it is possible to have values less than realmin. According to Reference 5.1, when any computation tries to produce a value smaller than realmin, it is said to underflow. This involves one of the optional, and controversial, aspects of the IEEE standard. Many, but not all, machines allow exceptional denormal or subnormal floating-point numbers in the interval between [realmin, eps * realmin] . So the smallest positive subnormal number is . So the counter is 1075.

The post Exploration of the IEEE-754 Floating Point Standard appeared first on Fusing Data and AI into VR.

]]>