Computer Vision Models

Explore state-of-the-art computer vision model architectures, immediately usable for training with your custom dataset.

Deploy select models (i.e. YOLOv8, CLIP) using the Roboflow Hosted API, or your own hardware using Roboflow Inference.

Showing

of

models.

Segment Anything Model (SAM)

Segment Anything (SAM) is an image segmentation model developed by Meta Research, capable of doing zero-shot segmentation.

Instance Segmentation

Deploy with Roboflow

YOLOv8 Pose Estimation

The YOLOv8 pose estimation model allows you to detect keypoints in an image.

Keypoint Detection

Deploy with Roboflow

YOLOv8

YOLOv8 is a state-of-the-art object detection and image segmentation model created by Ultralytics, the developers of YOLOv5.

Object Detection

Deploy with Roboflow

YOLOv8 Instance Segmentation

The state-of-the-art YOLOv8 model comes with support for instance segmentation tasks.

Instance Segmentation

Deploy with Roboflow

YOLOv9

YOLOv9 is an object detection model architecture released on February 21st, 2024.

Object Detection

Deploy with Roboflow

GroundingDINO

Grounding DINO is a zero-shot object detection model made by combining a Transformer-based DINO detector and grounded pre-training.

Object Detection

Deploy with Roboflow

YOLO-World

YOLO-World is a zero-shot object detection model.

Object Detection

Deploy with Roboflow

Tesseract

Tesseract is a highly popular OCR engine and project, now primarily developed open-source.

Deploy with Roboflow

YOLOv5 Instance Segmentation

YOLOv5 Instance Segmentation is a version of YOLOv5 that can be used for instance segmentation tasks.

Instance Segmentation

Deploy with Roboflow

YOLOv5 Classification

YOLOv5 Classification is a version of the YOLOv5 model used in single-label and multi-label image classification.

Deploy with Roboflow

YOLOv5

A very fast and easy to use PyTorch model that achieves state of the art (or near state of the art) results.

Object Detection

Deploy with Roboflow

Detectron2

Detectron2 is model zoo of it's own for computer vision models written in PyTorch.

Object Detection

Deploy with Roboflow

Mask RCNN

Mask RCNN is a convolutional neural network for instance segmentation.

Instance Segmentation

Deploy with Roboflow

OpenAI CLIP

CLIP (Contrastive Language-Image Pre-Training) is an impressive multimodal zero-shot image classifier that achieves impressive results in a wide range of domains with no fine-tuning. It applies the recent advancements in large-scale transformers like GPT-3 to the vision arena.

Deploy with Roboflow

YOLOv8 Classification

An image classification model built using YOLOv8.

Deploy with Roboflow

EasyOCR

Deploy with Roboflow

LLaVA-1.5

LLaVA is an open source multimodal language model that you can use for visual question answering and has limited support for object detection.

Object Detection

Deploy with Roboflow

DETR

Detection Transformer (DETR) is an end-to-end object detection model implemented using the Transformer architecture.

Object Detection

Deploy with Roboflow

YOLOv7

YOLOv7 is a state of the art object detection model.

Object Detection

Deploy with Roboflow

YOLOv7 Instance Segmentation

YOLOv7 Instance Segmentation lets you perform segmentation tasks with the YOLOv7 model.

Instance Segmentation

Deploy with Roboflow

Vision Transformer

The Vision Transformer leverages powerful natural language processing embeddings (BERT) and applies them to images.

Deploy with Roboflow

YOLOX

YOLOX is a high-performance object detection model.

Object Detection

Deploy with Roboflow

Faster R-CNN

One of the most accurate object detection algorithms but requires a lot of power at inference time. A good choice if you can do processing asynchronously on a server.

Object Detection

Deploy with Roboflow

EfficientNet

EfficientNet is from a family of image classification models from GoogleAI that train comparatively quickly on small amounts of data, making the most of limited datasets.

Deploy with Roboflow

YOLOv3 PyTorch

Though it is no longer the most accurate object detection algorithm, YOLO v3 is still a very good choice when you need real-time detection while maintaining excellent accuracy. PyTorch version.

Object Detection

Deploy with Roboflow

YOLOv3 Keras

Though it is no longer the most accurate object detection algorithm, YOLO v3 is still a very good choice when you need real-time detection while maintaining excellent accuracy. Keras implementation.

Object Detection

Deploy with Roboflow

FastSAM

FastSAM is an image segmentation model trained using 2% of the data in the Segment Anything Model SA-1B dataset.

Instance Segmentation

Deploy with Roboflow

MT-YOLOv6

MT-YOLOv6 is a YOLO based model released in 2022.

Object Detection

Deploy with Roboflow

Surya

Surya is a Python package designed for OCR on document layout analysis.

Deploy with Roboflow

YOLACT

A simple, fully convolutional model for real-time instance segmentation

Instance Segmentation

Deploy with Roboflow

CogVLM

CogVLM shows strong performance in Visual Question Answering (VQA) and other vision tasks.

Vision-Language

Deploy with Roboflow

YOLOv4 PyTorch

YOLOv4 has emerged as the best real time object detection model. YOLOv4 carries forward many of the research contributions of the YOLO family of models along with new modeling and data augmentation techniques. This implementation is in PyTorch.

Object Detection

Deploy with Roboflow

ByteTrack

ByteTrack is a multi-object tracking computer vision algorithm.

Object Detection

Deploy with Roboflow

MMOCR

MMOCR is an Optical Character Recognition model zoo implemented with the MMDetection package.

Deploy with Roboflow

QwenVL

Qwen-VL is an LMM developed by Alibaba Cloud. Qwen-VL accepts images, text, and bounding boxes as inputs. The model can output text and bounding boxes. Qwen-VL naturally supports English, Chinese, and multilingual conversation.

Vision-Language

Deploy with Roboflow

DocTR

DocTR is an Optical Character Recognition tool powered by deep learning.

Object Detection

Deploy with Roboflow

YOLO-NAS

YOLO-NAS is an object detection model developed by Deci that achieves SOTA performances compared to YOLOv5, v7, and v8.

Object Detection

Deploy with Roboflow

SegFormer

SegFormer is a computer vision framework used in semantic segmentation tasks, implemented with transformers.

Semantic Segmentation

Deploy with Roboflow

Scaled YOLOv4

Scaled YOLOv4 is an extension of the YOLOv4 research implemented in the YOLOv5 PyTorch framework.

Object Detection

Deploy with Roboflow

YOLOR

YOLOR (You Only Learn One Representation) is an object detection model that uses both implicit and explicit knowledge to make predictions.

Object Detection

Deploy with Roboflow

YOLOv5 Oriented Bounding Boxes

YOLOv5-OBB is a variant of YOLOv5 that supports oriented bounding boxes. This model is designed to yield predictions that better fit objects that are positioned at an angle.

Object Detection

Deploy with Roboflow

DETIC

Detic is an open source segmentation model developed by Meta Research and released in 2022.

Instance Segmentation

Deploy with Roboflow

OneFormer

OneFormer is a state-of-the-art multi-task image segmentation framework that is implemented using transformers.

Instance Segmentation

Deploy with Roboflow

MetaCLIP

MetaCLIP is a zero-shot classification and embedding model developed by Meta AI.

Deploy with Roboflow

YOLOS

YOLOS looks at patches of an image to to form "patch tokens", which are used in place of the traditional wordpiece tokens in NLP.

Object Detection

Deploy with Roboflow

BakLLaVA

BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture.

Vision-Language

Deploy with Roboflow

L2CS-Net

L2CS-Net is a gaze estimation model that enables you to calculate where someone is looking and in what direction someone is looking.

Object Detection

Deploy with Roboflow

MobileNet SSD v2

This architecture provides good realtime results on limited compute. It's designed to run in realtime (30 frames per second) even on mobile devices.

Object Detection

Deploy with Roboflow

CoDet

CoDet is an open vocabulary zero-shot object detection model.

Object Detection

Deploy with Roboflow

ResNet 32

A fast, simple convolutional neural network that gets the job done for many tasks, including classification.

Deploy with Roboflow

Grounded SAM

GroundedSAM combines Grounding DINO with the Segment Anything Model to identify and segment objects in an image given text captions.

Instance Segmentation

Deploy with Roboflow

Grounded EdgeSAM

Grounded EdgeSAM is a combination of Grounding DINO, a zero-shot object detection model, and EdgeSAM, a fast zero-shot image segmentation model.

Instance Segmentation

Deploy with Roboflow

SAM-CLIP

Use Grounding DINO, Segment Anything, and CLIP to label objects in images.

Instance Segmentation

Deploy with Roboflow

MobileNet V2 Classification

MobileNet is a GoogleAI model well-suited for on-device, real-time classification (distinct from MobileNetSSD, Single Shot Detector). This implementation leverages transfer learning from ImageNet to your dataset.

Deploy with Roboflow

ResNet 34

A fast, simple convolutional neural network that gets the job done for many tasks, including classification.

Deploy with Roboflow

YOLOv4 Darknet

YOLOv4 has emerged as the best real time object detection model. YOLOv4 carries forward many of the research contributions of the YOLO family of models along with new modeling and data augmentation techniques. This implementation is in Darknet.

Object Detection

Deploy with Roboflow

EfficientDet (D7) Tensorflow 2

A scalable, state of the art object detection model, implemented here within the TensorFlow 2 Object Detection API.

Object Detection

Deploy with Roboflow

EfficientDet

EfficientDet achieves the best performance in the fewest training epochs among object detection model architectures, making it a highly scalable architecture especially when operating with limited compute.

Object Detection

Deploy with Roboflow

YOLOv4 Tiny

The tiny and fast version of YOLOv4 - good for training and deployment on limited compute resources, and getting a feel for your dataset

Object Detection

Deploy with Roboflow

RTMDet

RTMDet is an efficient real-time object detector, with self-reported metrics outperforming the YOLO series. It achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, making it one of the fastest and most accurate object detectors available as of writing this post.

Object Detection

Deploy with Roboflow

DINOv2

DINOv2 is a self-supervised method for training computer vision models developed by Meta Research and released in April 2023.

Object Detection

Deploy with Roboflow

Kosmos-2

Kosmos-2 is a multimodal language model capable of object detection and grounding text in images.

Object Detection

Deploy with Roboflow

OWLv2

OWLv2 is a transformer-based object detection model developed by Google Research. OWLv2 is the successor to OWL ViT.

Object Detection

Deploy with Roboflow

FastViT

FastViT is a fast image classification model developed by Apple.

Deploy with Roboflow

OWL ViT

OWL-ViT is a transformer-based object detection model developed by Google Research.

Object Detection

Deploy with Roboflow

ALBEF

Deploy with Roboflow

BLIPv2

BLIPv2 is a multimodal model developed by Salesforce Research.

Deploy with Roboflow

BLIP

Deploy with Roboflow

GPT-4 with Vision

GPT-4 with Vision is a multimodal language model developed by OpenAI.

Object Detection

Deploy with Roboflow

VLPart

VLPart, developed by Meta Research, is an object detection and segmentation model that works with an open vocabulary

Object Detection

Deploy with Roboflow

YOLO-NAS Pose

YOLO-NAS Pose is a keypoint detection model developed by Deci AI.

Keypoint Detection

Deploy with Roboflow

SigLIP

SigLIP is an image embedding model defined in the "Sigmoid Loss for Language Image Pre-Training" paper.

Deploy with Roboflow

MobileCLIP

MobileCLIP is an image embedding model developed by Apple and introduced in the "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" paper

Deploy with Roboflow

BioCLIP

BioCLIP is a Vision Foundation Model for the Tree of Life

Deploy with Roboflow

RemoteCLIP

RemoteCLIP is a zero-shot classification model for remote sensing.

Deploy with Roboflow

AltCLIP

AltCLIP is a zero-shot image classification model.

Deploy with Roboflow

YOLOv8 Oriented Bounding Boxes

You can retrieve bounding boxes whose edges match an angled object by training an oriented bounding boxes object detection model, such as YOLOv8's Oriented Bounding Boxes model.

Object Detection

Deploy with Roboflow

Anthropic Claude 3

Vision-Language

Deploy with Roboflow

ResNet-50

Deploy with Roboflow

Google Gemini

Gemini is a family of Large Multimodal Models (LMMs) developed by Google Deepmind focused specifically on multimodality.

Vision-Language

Deploy with Roboflow

TrOCR

TrOCR is a Transformer-based OCR model developed by researchers from Microsoft Research.

Deploy with Roboflow

Oops!

It seems there are no results matching your filters.

Visual Question Answering

Image Similarity

Image Captioning

Zero-shot Detection

Real-Time Vision

Image Embedding

LLMS with Vision Capabilities

Multimodal Vision

Foundation Vision