Generative AI | Databricks Blog

Page 2

Turbocharged Training: Optimizing the Databricks Mosaic AI Stack With FP8

March 21, 2024 by Mihir Patel, Cheng Li, Davis Blalock and Saaketh Narayan in Mosaic Research

At Databricks, we believe that the best companies in the world, in every sector, will have AI-powered systems that are trained and customized...

Fast, Secure and Reliable: Enterprise-grade LLM Inference

March 20, 2024 by Linden Li, Jeffrey Chen, Megha Agarwal, Margaret Qian and Daya Khudia in Mosaic Research

Introduction After a whirlwind year of developments in 2023, many enterprises are eager to adopt increasingly capable generative AI models to supercharge their...

Fine-Grained Human Feedback

February 27, 2024 by Prithviraj (Raj) Ammanabrolu in Mosaic Research

(This post written in collaboration with Zeqiu (Ellen) Wu and Yushi Hu , both PhD students affiliated with the University of Washington, and...

LIMIT: Less Is More for Instruction Tuning

February 10, 2024 by Aditi Jha and Jacob Portes in Mosaic Research

How should you finetune a large language model for general-purpose question answering? One intriguing approach is that of supervised finetuning on a small...

US Air Force Hackathon: How Large Language Models Will Revolutionize USAF Flight Test

February 9, 2024 by Jordan Conner, Luis Moros, Riley Livermore, Danny Riley, Troy Soileau, Ben Faircloth, Tim Lortz and Li Yu in Generative AI

[DISTRIBUTION STATEMENT A. Approved for public release; Distribution is unlimited 412TW-PA-24004] The views expressed are those of the author and do not reflect...

OLMo Is Here, Powered by Databricks

February 1, 2024 by Jonathan Frankle in Mosaic Research

As Chief Scientist (Neural Networks) at Databricks, I lead our research team toward the goal of giving everyone the ability to build and...

Serving Quantized LLMs on NVIDIA H100 Tensor Core GPUs

January 31, 2024 by Nikhil Sardana, Julian Quevedo and Daya Khudia in Mosaic Research

Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more...

Building and Customizing GenAI with Databricks: LLMs and Beyond

January 22, 2024 by Ari Kaplan, Emily Hutson and Nicolas Pelaez in Generative AI

Generative AI has opened new worlds of possibilities for businesses and is being emphatically embraced across organizations. According to a recent MIT Tech...

LLM Training and Inference with Intel Gaudi 2 AI Accelerators

January 4, 2024 by Abhi Venigalla and Daya Khudia in Mosaic Research

At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or...

Integrating NVIDIA TensorRT-LLM with the Databricks Inference Stack

December 21, 2023 by Linden Li, Megha Agarwal, Kobie Crawford and Daya Khudia in Mosaic Research

Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important.