TensorOpera's New Move: Fox-1 Small Language Model Enhances Generative AI Experience

TensorOpera, the company behind "Your Generative AI Platform at Scale," has introduced its latest innovation, the TensorOpera Fox-1. This small language model (SLM) with 1.6 billion parameters is set to revolutionize the generative AI field by enhancing scalability and ownership.

TensorOpera's New Move: Fox-1 Small Language Model Enhances Generative AI Experience
TensorOpera's New Move: Fox-1 Small Language Model Enhances Generative AI Experience

TensorOpera, the company behind "Your Generative AI Platform at Scale," has introduced its latest innovation, the TensorOpera Fox-1. This small language model (SLM) with 1.6 billion parameters is set to revolutionize the generative AI field by enhancing scalability and ownership. The Fox-1 distinguishes itself by outperforming other SLMs from industry giants like Apple, Google, and Alibaba, offering developers and enterprises an efficient solution for creating and deploying generative AI models across various infrastructures without demanding significant resources.

Small language models like Fox-1, which have fewer than 2 billion parameters, are making significant strides in AI by delivering powerful capabilities with reduced computational and data requirements. This efficiency is crucial for deploying AI applications across a range of platforms, from mobile devices to servers, while maintaining high performance.

Salman Avestimehr, Co-Founder and CEO of TensorOpera and Dean’s Professor of ECE and CS at the University of Southern California, highlighted the significance of this development: “The launch of Fox-1 and its integration into TensorOpera’s AI platform is a major step towards our vision of providing an integrated edge-cloud platform for Generative AI. This would enable seamless training, creation, and deployment of generative AI applications across a wide range of platforms and devices, ranging from powerful GPUs in cloud settings to edge devices like smartphones and AI-equipped PCs, enhancing efficiency, privacy, and personalization.”

Fox-1 was developed from scratch, trained on 3 trillion tokens of text and code data using an 8K sequence length. It features a decoder-only transformer structure with 16 attention heads and grouped query attention, making it significantly deeper than its competitors—78% deeper than Google’s Gemma-2B, 33% deeper than Alibaba’s Qwen1.5-1.8B, and 15% deeper than Apple’s OpenELM-1.1B. In benchmarks such as MMLU, ARC Challenge, TruthfulQA, and GSM8k, Fox-1 has shown superior or comparable performance to other SLMs in its class, including Gemma-2B, Qwen1.5-1.8B, and OpenELM-1.1B.

According to Tong Zhang, Chief AI Scientist at TensorOpera and Professor of CS at UIUC, “SLMs can effectively incorporate data in special domains in the pre-training phase, which leads to optimized expert models that can be trained and deployed in a decentralized fashion.” The integration of domain-specific SLMs into innovative architectures like Mixture of Experts (MoE) and model federation systems further enhances their utility, enabling the construction of more powerful systems by integrating expert SLMs to handle complex tasks.

Fox-1 achieves an impressive throughput of over 200 tokens per second on the TensorOpera model serving platform, surpassing the performance of Gemma-2B and equaling that of Qwen1.5-1.8B in identical deployment environments. This high throughput is largely due to its architectural design, which includes Grouped Query Attention (GQA) for more efficient query processing. By dividing query heads into groups that share a common key and value, Fox-1 significantly improves inference latency and response times.

The integration of Fox-1 into both the TensorOpera AI Platform and the TensorOpera FedML Platform enhances its versatility, enabling deployment and training across both cloud and edge computing environments. This empowers AI developers to train and build their models and applications on the cloud using the comprehensive capabilities of the TensorOpera AI Platform, and then deploy, monitor, and personalize these solutions directly onto smartphones and AI-enabled PCs via the TensorOpera FedML Platform. This approach offers cost efficiency, enhanced privacy, and personalized user experiences within a unified ecosystem that facilitates seamless collaboration between cloud and edge environments.

Fox-1 is released under the Apache 2.0 license through the TensorOpera AI Platform and Hugging Face, providing the community with freedom for both production and research uses.

For more information and to get started with Fox-1, visit TensorOpera’s blogpost.

About TensorOpera, Inc.

TensorOpera, Inc. (formerly FedML, Inc.) is an innovative AI company based in Silicon Valley, specifically Palo Alto, California. TensorOpera specializes in developing scalable and secure AI platforms, offering two flagship products tailored for enterprises and developers. The TensorOpera® AI Platform, available at TensorOpera.ai, is a comprehensive generative AI platform for model deployment and serving, model training and fine-tuning, AI agent creation, and more. It supports launching training and inference jobs on a serverless/decentralized GPU cloud, experimental tracking for distributed training, and enhanced security and privacy measures. The TensorOpera® FedML Platform, accessible at FedML.ai, leads in federated learning and analytics with zero-code implementation. It includes a lightweight, cross-platform Edge AI SDK suitable for edge GPUs, smartphones, and IoT devices. Additionally, it offers a user-friendly MLOps platform to streamline decentralized machine learning and deployment in real-world applications. Founded in February 2022, TensorOpera has quickly grown to support a large number of enterprises and developers worldwide.