

KineticFlow™ is Ghost's neural network that processes raw camera inputs to develop scene understanding. Grounded in physics, the network detects surfaces and objects in a scene and calculates distance, velocity, and motion vector for every pixel.
Unlike other algorithms that rely upon image-based object recognition, KineticFlow detects obstacles and road users based upon the physical properties of objects and light in motion, enabling object detection without requiring object recognition.
Just as humans use multiple cues to establish depth with vision, KineticFlow fuses together multiple mono and stereo computer vision algorithms into a single neural network. The network analyzes video sequences from multiple cameras over time to improve object detection and manage occlusions.
Autonomous driving maneuvers require an accurate understanding of the driving scene – the arrangement of roads and objects and how they move relative to the car. This understanding begins with detection and measurement – discovering the objects and road elements, and estimating the distance, velocity, and motion direction of each object.
To produce this, most systems leverage several computer vision algorithms organized into a perception pipeline that builds upon each successive algorithm’s outputs.
Using AI to detect and identify each object type as well as the road, sky, and barriers.
Example algorithms:
Panoptic segmentation
Object detection
Semantic segmentation
Outline/pattern matching
Detect objects, road, and features
Using object type to infer probable width and calculate distance with trigonometry.
Example algorithms:
Pictorial depth estimation
Stereo disparity
Estimate object distance
Use pixel expansion or compare distance over multiple frames to calculate velocity and motion
Example algorithms:
Optical flow
Multi-frame object tracking and comparison
Determine velocity & motion direction
This legacy pipeline has been widely adopted for L2 advanced driver-assistance systems (ADAS). ADAS features are built with the fundamental assumption that a backup driver will remain attentive at all times, which significantly relaxes the requirements for object recognition reliability, distance estimation accuracy, and overall resiliency to failures. Many systems operate on just a single camera, using a single processing chip. This technology was never designed for L4 autonomous driving without a human safety backup.
To address some of its shortcomings, developers have tried adding sensors such as LiDAR to verify distance and velocity measurements. However, this simply ends up trading unresolved computer vision challenges for other issues, adding expense, reliability concerns, and sensor fusion complexity.
Image-based object recognition cannot be trained for every possible object, and struggles to recognize objects that are uncommon, rotated, flipped, or partially occluded.
Distance estimations are subject to the quality of object recognition - mistaking a 1.7-meter-wide car for a 2.1-meter-wide car will result in an estimation error of more than 30%.
Susceptible to failure on component loss if implemented with a single camera or processing chip.
Degrades with distance, poor visibility, and low-lighting conditions because image-based object recognition requires high-quality images.
Requires high-performance compute and/or specialized ASICs to run in real-time in the car, adding cost and reducing flexibility.
KineticFlow is trained on the universal physics properties that dictate how all objects behave, move, and how light bounces off them, removing the dependency on traditional image-based object recognition.
Reduces risk of misrecognized or unrecognized objects.
Reduces measurement errors from estimates based on object type.
By eliminating the need for explicit recognition, obstacles can be perceived in low light or states of partial reveal.
Complex computer vision algorithms that require minutes per frame on high-powered CPUs are trained in the data center and converted into the KineticFlow neural network, enabling real-time execution on the road in just μs per frame with low-power system-on-a-chip processors (SoCs).
Intensive algorithms like high-definition stereo vision can now run in real-time in the car.
Mobile SoCs require a fraction of the power of CPUs/GPUs typically used for autonomy, increasing vehicle range and fuel efficiency.
By foregoing custom ASICs, algorithm updates can be regularly performed over-the-air for evergreen improvement.
The universal nature of KineticFlow’s physics-based algorithms makes automated training in the data center possible with training data sets that can be verified for completeness and are orders-of-magnitude smaller than other approaches.
Physics-based approach enables smaller training sets that can be assembled and verified for completeness across all hyperspace dimensions.
All training data is automatically labeled in the data center and validated versus ground truth.
New sensor generations, vehicle configurations, and features can be rapidly re-trained and validated.
By integrating multiple computer vision algorithms - including physics-based detection, stereo disparity, and a 3D variant of optical flow - into a single neural network, KineticFlow detects the objects in a scene and the distance, velocity, and motion path of every pixel.
Multiple algorithms naturally reinforce one another to collectively provide higher-confidence outputs than each can generate alone.
As the strengths of each algorithm vary across distances, speeds, and lighting conditions, every output includes a confidence range, enabling the driving program to take accuracy into account.
While mono or stereo vision fail completely with the loss or occlusion of a single camera, KineticFlow can fallback to single-camera mode, producing a subset of information that still enables safe driving in most operational design domains when coupled with radar.
KineticFlow leverages the resolution, dynamic range, and speed of modern 48-megapixel camera sensors, enabling long-distance perception and obviating the need for multiple cameras at different fields of view.
Automatically corrects for occluded objects and unusual or missing frames by combining data over time, including managing temporary occlusions by analyzing stereo video streams.
48-megapixel sensors equip computer vision algorithms to operate at distances required for high-speed highway driving.