KineticFlow: the universal vision neural network

KineticFlow™ is Ghost's neural network that processes raw camera inputs to develop scene understanding. Grounded in physics, the network detects surfaces and objects in a scene and calculates distance, velocity, and motion vector for every pixel.

Key Attributes

Unlike other algorithms that rely upon image-based object recognition, KineticFlow detects obstacles and road users based upon the physical properties of objects and light in motion, enabling object detection without requiring object recognition.

Just as humans use multiple cues to establish depth with vision, KineticFlow fuses together multiple mono and stereo computer vision algorithms into a single neural network. The network analyzes video sequences from multiple cameras over time to improve object detection and manage occlusions.

LEGACY CHALLENGES

Why existing perception techniques won’t deliver L4 autonomy

Autonomous driving maneuvers require an accurate understanding of the driving scene – the arrangement of roads and objects and how they move relative to the car. This understanding begins with detection and measurement – discovering the objects and road elements, and estimating the distance, velocity, and motion direction of each object.

To produce this, most systems leverage several computer vision algorithms organized into a perception pipeline that builds upon each successive algorithm’s outputs.

Traditional computer
vision pipeline

Detect objects, road, and features

Using AI to detect and identify each object type as well as the road, sky, and barriers.

Example algorithms:

Panoptic segmentation

Object detection

Semantic segmentation

Outline/pattern matching

Detect objects, road, and features

Estimate object distance

Using object type to infer probable width and calculate distance with trigonometry.

Example algorithms:

Pictorial depth estimation

Stereo disparity

Estimate object distance

Determine velocity & motion direction

Use pixel expansion or compare distance over multiple frames to calculate velocity and motion

Example algorithms:

Optical flow

Multi-frame object tracking and comparison

Determine velocity & motion direction

This legacy pipeline has been widely adopted for L2 advanced driver-assistance systems (ADAS). ADAS features are built with the fundamental assumption that a backup driver will remain attentive at all times, which significantly relaxes the requirements for object recognition reliability, distance estimation accuracy, and overall resiliency to failures. Many systems operate on just a single camera, using a single processing chip. This technology was never designed for L4 autonomous driving without a human safety backup.
‍
To address some of its shortcomings, developers have tried adding sensors such as LiDAR to verify distance and velocity measurements. However, this simply ends up trading unresolved computer vision challenges for other issues, adding expense, reliability concerns, and sensor fusion complexity.

Challenges

Limitations of image recognition

Image-based object recognition cannot be trained for every possible object, and struggles to recognize objects that are uncommon, rotated, flipped, or partially occluded.

Accuracy

Distance estimations are subject to the quality of object recognition - mistaking a 1.7-meter-wide car for a 2.1-meter-wide car will result in an estimation error of more than 30%.

Resiliency

Susceptible to failure on component loss if implemented with a single camera or processing chip.

Low light & occlusion performance

Degrades with distance, poor visibility, and low-lighting conditions because image-based object recognition requires high-quality images.

Power efficiency

Requires high-performance compute and/or specialized ASICs to run in real-time in the car, adding cost and reducing flexibility.

A NEW APPROACH

KineticFlow: next-generation
AI for autonomy

Based on the universal laws of physics, Ghost is developing a new type of artificial intelligence designed to overcome the limitations of traditional computer vision algorithms. KineticFlow analyzes pixels from multi-camera video to detect surfaces and objects and calculate their distance, velocity, and motion with high speed, accuracy, and reliability.

Universal detection, no recognition required

KineticFlow is trained on the universal physics properties that dictate how all objects behave, move, and how light bounces off them, removing the dependency on traditional image-based object recognition.

Safety

Reduces risk of misrecognized or unrecognized objects.

Accuracy

Reduces measurement errors from estimates based on object type.

Low light and occlusion performance

By eliminating the need for explicit recognition, obstacles can be perceived in low light or states of partial reveal.

Spotlight: Physics-based detection

Real-time performance on standard SoCs

Complex computer vision algorithms that require minutes per frame on high-powered CPUs are trained in the data center and converted into the KineticFlow neural network, enabling real-time execution on the road in just μs per frame with low-power system-on-a-chip processors (SoCs).

Speed

Intensive algorithms like high-definition stereo vision can now run in real-time in the car.

Low power

Mobile SoCs require a fraction of the power of CPUs/GPUs typically used for autonomy, increasing vehicle range and fuel efficiency.

Upgradability

By foregoing custom ASICs, algorithm updates can be regularly performed over-the-air for evergreen improvement.

Scalable training with auto-labeled data

The universal nature of KineticFlow’s physics-based algorithms makes automated training in the data center possible with training data sets that can be verified for completeness and are orders-of-magnitude smaller than other approaches.

Efficient

Physics-based approach enables smaller training sets that can be assembled and verified for completeness across all hyperspace dimensions.

No human labeling

All training data is automatically labeled in the data center and validated versus ground truth.

Rapid cycle times

New sensor generations, vehicle configurations, and features can be rapidly re-trained and validated.

Spotlight: Automated data labeling

Combining complementary algorithms

By integrating multiple computer vision algorithms - including physics-based detection, stereo disparity, and a 3D variant of optical flow - into a single neural network, KineticFlow detects the objects in a scene and the distance, velocity, and motion path of every pixel.

Reliability

Multiple algorithms naturally reinforce one another to collectively provide higher-confidence outputs than each can generate alone.

Accuracy

As the strengths of each algorithm vary across distances, speeds, and lighting conditions, every output includes a confidence range, enabling the driving program to take accuracy into account.

Resiliency

While mono or stereo vision fail completely with the loss or occlusion of a single camera, KineticFlow can fallback to single-camera mode, producing a subset of information that still enables safe driving in most operational design domains when coupled with radar.

High-definition video

KineticFlow leverages the resolution, dynamic range, and speed of modern 48-megapixel camera sensors, enabling long-distance perception and obviating the need for multiple cameras at different fields of view.

Reliability

Automatically corrects for occluded objects and unusual or missing frames by combining data over time, including managing temporary occlusions by analyzing stereo video streams.

Distance

48-megapixel sensors equip computer vision algorithms to operate at distances required for high-speed highway driving.

Explore the platform

Learn how the Ghost Autonomy Engine works

Explore

Explore our approach to safety

Learn how we’re building safe self-driving for everyone

Explore

TEch SPOTLIGHT

Universal, physics-based object detection

For traditional approaches to computer vision, it all starts with the object. AI algorithms are trained to recognize objects using millions of images of what might be on the road – cars, trucks, motorcycles, cones, and various pieces of debris – in every color, shape, rotation, and lighting condition. Unfortunately, the infinite list of things that may be on the road makes this an intractable problem – straightforward to get to 99.99…% functional, but impossible to get to 100%.

This can lead to several problems on the road, including the risk of colliding with a misrecognized or unrecognized object and the risk of misestimating an object’s size and therefore its distance. More nuanced challenges include the difficulty of object recognition in low lighting conditions or when objects are partially occluded, as half of a car doesn’t always look like a car.

KineticFlow takes an entirely different approach: universal detection without requiring object recognition.

KineticFlow relies upon the physics properties of objects:

Detects the planes of objects, i.e., flat surfaces that point in a given direction and reflect light consistently. A nearby object may be deconstructed into five to ten planes, whereas a faraway object may appear as a single plane.
Multiple planes that move together can be grouped together and detected as an object by observing them over time.
The road, i.e., the ground plane, can be detected and disambiguated from the scene.
Objects on the road surface such as painted features can then be disambiguated from objects above the road, i.e., vehicles, obstacles, and bridges.

TECH SPOTLIGHT

Automated data labeling and training

To reduce the risk of the “long tail” of image recognition – unusual and infrequently seen objects – autonomous software developers are exponentially increasing the size of their training data. This not only leads to data growth, but also growth in the GPU farms required to train networks, and in the human armies of manual data labelers who annotate and verify ground truth images, with no end in sight.
‍
Because of its physics-based approach to AI, Ghost can not only train KineticFlow on a substantially smaller data corpus, but it can also use mathematical processes to reason about the completeness of the training data set.

Ghost’s fleet of vehicles collect ground truth video for training with the same stereo cameras that are used by the Ghost Autonomy Engine for perception and driving. This data is automatically uploaded to Ghost Train, the centralized AI training infrastructure, where it is automatically labeled using cloud compute and high-fidelity computer vision algorithms and validated for labelling accuracy. This process is continuously building Ghost’s training data repository.

To train a neural network, a training data set is assembled by pulling data from the Ghost repository to cover every aspect of each dimension of the network’s hyperspace, instead of just amassing training data randomly and hoping to achieve full coverage. This ensures the completeness of the network’s training data set, safeguarding against gaps or the risk of overfitting. During the training process, the neural network is trained with a primary training data set, validated against an independent validation data set, and finally validated again against a third holdout data set to protect against overfitting. This entire process is automated – no human data labeling is required.

KineticFlow: the universal vision neural network

Why existing perception techniques won’t deliver L4 autonomy

Traditional computervision pipeline

Detect objects, road, and features

Estimate object distance

Determine velocity & motion direction

Challenges

Limitations of image recognition

Accuracy

Resiliency

Low light & occlusion performance

Power efficiency

KineticFlow: next-generationAI for autonomy

Universal detection, no recognition required

Real-time performance on standard SoCs

Scalable training with auto-labeled data

Combining complementary algorithms

High-definition video

Universal, physics-based object detection

Automated data labeling and training

Learn more about working with Ghost

Traditional computer
vision pipeline

KineticFlow: next-generation
AI for autonomy