At the Google Cloud Next conference in San Francisco this week, the search-engine giant said it will partner with Nvidia to integrate the new Tesla P4 graphics card into the Google Cloud services for artificial-intelligence (AI) inference tasks.
is known for its leadership in artificial intelligence, though most of that attention comes from the incredible compute density required for the training of AI models. Systems like the DGX-2 are built with the fastest available graphics chips and the most demanding data infrastructure to accelerate the construction of AI models. Those could target new voice-recognition algorithms, image matching, video-content scanning and autonomous driving.
The graphics chips previously used for gaming and 3D rendering were the linchpin of this development acceleration for AI. AMD
offers similar GPUs of its own, and Intel
has recently announced its intent to offer discrete graphics solutions for AI and gaming in 2020, but the mantle of leader in AI is undisputedly with Nvidia. Intel is working on more dedicated accelerators with acquisitions such as Nervana and Movidius, but GPUs will remain the dominant solution due to their flexibility in programming. Nvidia spent a decade bridging the gap for AI and deep learning with software tools that have created this market.
AI inference is the process of taking the large data models that AI training creates and compressing them down to a form that can run on lower-power, lower-cost hardware. The inference models then accept new data from the client (a picture taken on a phone, a voice command) and apply the data model and come to the most correct result (facial recognition in that photo, a wake command from voice).
Nvidia does not have the same reputation for its ability to run AI inference as the market tends to associate it with smartphones or internet of things (IoT) solutions like smart security cameras. However, Nvidia has a wide assortment of inference solutions and an accelerated software stack for developers similar to the one that has helped raise its lead in training. Today the majority of artificial-intelligence inference tasks still occur in the cloud and are distributed to end devices.
The chips used for inference can range from the flagship $8,000 graphics cards down to Jetson-branded development boards that offer processors similar to the ones used in the Nintendo Switch gaming console. Applications of those options range from autonomous robotics to drones, though Nvidia does not have a direct solution for the smallest devices, like smartphones. Instead it partnered with Arm to create AI building blocks that can be integrated into custom chip designs.
Having a developer-friendly software solution, which Nvidia calls TensorRT, is a critical component for any company trying to offer AI solutions to the market. Those allow a coder to take complex AI-trained models and target different classes of GPUs for inference models. With TensorRT, Nvidia can keep customers inside the same Nvidia-powered ecosystem, from AI training to AI inference, both simplifying the development process and maintaining customer involvement.
The Tesla P4 product that Alphabet’s
Google is using in its cloud-based AI systems is a simple GPU by most enterprise standards, but Google can take advantage of its specific design parameters to create a cost- and power-efficient solution for mass deployment.
There are hundreds of instances of could-based AI-inference applications. Microsoft
uses Nvidia graphics chips for Bing visual search capability, allowing users to search for pictures with text or similar imagery. SAP
uses Nvidia hardware to measure the appearances of advertising for its clients during sporting events, all in real time, allowing for adjustments on the fly. A company called Valossa analyzes videos to determine the content and context, connecting what is shown on the screen with expected emotional responses, improving search and recommendations.
Google Cloud will now offer similar performance and capability to its customers courtesy of the Nvidia Tesla P4.
Intel has been promoting its Xeon family of processors for AI inference, asserting the benefits of customers using the hardware they already have in their data centers for those tasks. Nvidia offers significant performance and latency advantages for inference models that require speed and interactivity. The latency of these AI calculations, the amount of time between a requested answer and the result, is crucial for real-time interaction with consumers.
A semi-customizable solution called an FPGA (field programmable gate array) is offered by both Intel and Xilinx for AI solutions, and though they can be power efficient and high performance, they aren’t as flexible as the programmable solutions built for graphics chips. Every time a new AI training model and algorithm is updated, the resulting inference model will change as well. It can take days or weeks for these new models to be updated and applied for FPGAs, but Nvidia GPUs benefit from nearly instant rollout capability.
is known to be updating its AI algorithms for facial recognition multiple times a day during the development process. Only a programmable and high-performance solution like GPUs work for that.
Though the most prominent place for AI discussion today is around training and the massive amount of compute needed for each model, the market for AI inference will be significantly larger as consumers and devices ramp up AI integration.
Nvidia needs to prove that it has the product line in place to capitalize on its leadership in training to gain a foothold on inference. Though the company doesn’t have mobile processors for smartphones anymore, the combination of a sizeable GPU portfolio and custom processors like Tegra integrated on Jetson platforms allows it to address a huge part of the growing market. The design win with Google Cloud validates Nvidia’s efforts and product capability in the inference field, continuing its directive of being a leader in AI.