VisionInfer

Lightweight Visual Language Model (VLM) Inference Tool optimized for Jetson Edge Devices and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.

Features

🎥 Multi-source support: USB cameras, RTSP streams, VOD files, live network streams
🚀 Motion-gated inference (only run inference when motion detected)
🎯 Frame deduplication (skip similar frames via L2 feature comparison)
📊 Real-time performance monitoring (encoding/inference time, frame metrics)
🔧 Jetson-optimized: Tailored for ARM64 architecture and limited edge resources
🎛️ Configurable parameters: Compression quality, inference interval, motion threshold
🪵 Debug mode for troubleshooting (--debug flag)

Requirements

General Requirements

Python 3.8+
OpenCV (cv2)
NumPy
psutil
Ollama (v0.1.40+) [Optional]
YOLO [Optional]
FFmpeg (for frame extraction from streams/files)

Jetson-Specific Requirements

Jetson Nano/Xavier NX/Orin (JetPack 6.0+)
Minimum 8GB RAM

Installation

Install Dependencies Script Usage

Our install_deps.sh script supports flexible dependency installation with optional Ollama backend, and is compatible with both sh (dash) and bash on Ubuntu/Jetson systems.

Basic Usage

Scenario	Command
Install only core dependencies (ffmpeg, python3-pip, pipx)	`curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \| sh`
Install core dependencies + Ollama backend	`curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \| sh -s -- --backend ollama`
Show script help (check parameters)	`curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh \| sh -s -- --help`

Compatibility Note

For better compatibility (especially on Jetson), you can replace sh with bash (recommended):

# Install core dependencies + Ollama (bash execution)
curl -fsSL https://raw.githubusercontent.com/iloveyou-github/VisionInfer/main/install_deps.sh | bash -s -- --backend ollama

Install VisionInfer

For Jetson (Pre-installed System OpenCV with CUDA)

To avoid breaking system dependencies (e.g., JetPack's pre-built OpenCV), use --system-site-packages to reuse the system's OpenCV:

If you do not plan to use YOLO models in the future, we recommend installing using the following command.

pipx install --system-site-packages vinfer

If you plan to use YOLO models in the future, we strongly recommend installing them with the following command.

pip install --system-site-packages vinfer
pip install ultralytics --no-deps
pip install matplotlib pillow polars psutil pyyaml requests scipy ultralytics-thop

For Other Systems (No Special OpenCV)

Install with full dependencies (includes OpenCV) if your system doesn't have a pre-configured OpenCV:

pipx install vinfer[full]

Jetson Resource Configuration

Increase Swap Space [Optional]

# Create 4GB swap file
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Make swap permanent (survive reboot) [Optional]
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

Configure GPU Memory (Jetson Orin/Nano)

# For Jetson Orin (set 16GB GPU memory)
sudo nvpmodel -m 0
sudo jetson_clocks

# For Jetson Nano (set max performance mode)
sudo nvpmodel -m 0
sudo jetson_clocks

Pull Optimized Model (Jetson)

# Recommended lightweight model for Jetson
ollama pull qwen3.5:2b

Quick Start

USB Camera Inference

# Basic USB camera (device ID 0) with debug logs
vinfer cam --usb-dev 0 --debug

# USB camera with motion detection (infer only on motion)
vinfer cam --usb-dev 0 --motion-gate --motion-threshold 500

# USB camera with frame deduplication (skip similar frames)
vinfer cam --usb-dev 0 --dedup --interval 2.0

# Basic USB camera (device ID 0) with YOLO
vinfer cam --usb-dev 0 --model "yolo"

# Basic USB camera (device ID 0) with YOLO26 in Detection task 
vinfer cam --usb-dev 0 --model "yolo" --yolo-version 26 --yolo-task "detection"

RTSP Camera Inference

# Basic RTSP stream (default credentials)
vinfer cam --rtsp-host 192.168.1.10 --rtsp-user admin --rtsp-pass password --debug

# RTSP with custom compression (320x240) and JPG quality (80)
vinfer --rtsp-host 192.168.1.10 --compress-size 320x240 --jpg-quality 80

# Simple RTSP stream (default credentials) with YOLO
vinfer -H 192.168.1.10 -m "yolo" 

# Simple RTSP stream (default credentials) with YOLO11 in Pose task
vinfer -H 192.168.1.10 -m "yolo" -yv 11 -yt "pose"

VOD (Video File) Analysis

# Local video file (analyze every 30 frames)
vinfer analyze --type vod --file /path/to/video.mp4 --start 0 --step 30

# Network VOD URL (e.g., MP4 stream)
vinfer analyze --type vod --url https://example.com/video.mp4 --debug

Live Stream Analysis

# HLS live stream (e.g., .m3u8)
vinfer analyze --type live --url https://example.com/stream.m3u8 --interval 1.0

Command Reference

Core Subcommands

Subcommand	Description
`cam`	Real-time camera inference (USB/RTSP)
`analyze`	Offline video/live stream analysis

Common Arguments

Argument	Short	Description	Default
`--model`	`-m`	Ollama model name or YOLO	`qwen3.5:2b`
`--compress-size`	`-s`	Frame compression resolution (WxH)	`480x360`
`--jpg-quality`	`-q`	JPG compression quality (0-100)	`70`
`--motion-gate`	`-g`	Enable motion detection (infer only on motion)	`False`
`--motion-threshold`	`-T`	Minimum motion area (pixels)	`500`
`--dedup`	`-D`	Enable frame deduplication (disabled if motion-gate is on)	`False`
`--interval`	`-i`	Inference interval (seconds/frame)	`1.0`
`--debug`	`-d`	Enable verbose debug logging	`False`
`--Prompt`	`-r`	User-defined prompts
`--accelerate`	`-a`	Accelerate reasoning speed	`False`
`--version`	`-v`	Show vinfer version
--yolo-version	`-yv`	Use YOLO version [8, 11, 26]	8
--yolo-task	`-yt`	Use YOLO task ['detection', 'segment', 'classify', 'pose', 'obb']	detection

Cam Subcommand Arguments

Argument	Short	Description
`--rtsp-host`	`-H`	RTSP server IP/domain (enables RTSP mode)
`--rtsp-user`	`-U`	RTSP authentication username
`--rtsp-pass`	`-P`	RTSP authentication password
`--usb-dev`	`-u`	USB camera device ID (0 = /dev/video0)
`--show-preview`	`-p`	Start live preview window

Analyze Subcommand Arguments

Argument	Short	Description
`--type`	`-t`	Analysis type (`vod`/`live`)
`--file`	`-f`	Local VOD file path
`--url`	`-u`	Network VOD/live stream URL
`--start`	`-st`	Start frame number (0-based)
`--step`	`-sp`	Inference frame interval

Troubleshooting

Common Issues & Solutions

Cannot uninstall sympy

Symptom: Cannot uninstall Sympy 1.9
Solution:
```
sudo apt remove python3-sympy -y
```

numpy version conflict

Symptom：numpy version conflict
Solution:
- Install the specified version
```
sudo pip3 install numpy==1.23.5
```

EOF Error During Frame Extraction

Symptom: EOFError/IOError when reading frames from RTSP/live streams
Solutions:
- Increase RTSP timeout: Add -stimeout 20000000 to FFmpeg command (code already includes this)
- Check network stability (RTSP streams require low latency)
- Use TCP for RTSP: --rtsp-transport tcp (enabled by default in code)

Zombie Processes (FFmpeg/Ollama)

Symptom: Orphaned FFmpeg/Ollama processes consuming resources

Solutions:

The code includes kill_all_ffmpeg() and stop_ollama_serve() for cleanup

Manually kill zombie processes:

# Kill all FFmpeg processes
sudo pkill -f ffmpeg

# Restart Ollama service
sudo systemctl restart ollama

Resource Exhaustion (Jetson)

Symptom: Out of memory errors or slow inference
Solutions:
- Use smaller models (qwen3.5:2b instead of 7b)
- Increase swap space (see Installation > Jetson Configuration)
- Reduce frame resolution (--compress-size 320x240)
- Increase inference interval (--interval 2.0 or higher)

Frame Extraction Failure

Symptom: Frame extraction failed, unable to perform inference
Solutions:
- Verify RTSP URL/USB device accessibility
- Check FFmpeg installation (ffmpeg -version)
- For RTSP: Ensure camera is online and credentials are correct

Continuous Inference Errors

Symptom: Continuous inference exception: [error message]
Solutions:
- Enable debug mode (--debug) to see detailed error logs
- Check Ollama service status (sudo systemctl status ollama)
- Verify model is pulled (ollama list to check installed models)

Known Limitations

Jetson-Specific Limitations

Model Size: Avoid 7B+ models (e.g., qwen3.5:7b) on Jetson Nano/Xavier NX—use qwen3.5:2b for stable performance
Inference Speed: 2B models run at ~1-2 FPS on Jetson Orin, ~0.5 FPS on Jetson Nano
Preview Window: May be slow on Jetson Nano (disable with --no-preview if needed)

General Limitations

RTSP Latency: RTSP streams may have 1-3s latency (normal for TCP transport)
Frame Deduplication: May skip valid frames in low-motion scenarios (adjust DEDUP_THRESHOLD if needed)
Motion Detection: Sensitive to lighting changes (tune --motion-threshold for your environment)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

[YOLO](Ultralytics | Revolutionizing the World of Computer Vision) for end-to-end computer vision platform
Ollama for lightweight LLM inference
OpenCV for computer vision processing
NVIDIA Jetson for edge AI platform support

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
vinfer		vinfer
LICENSE		LICENSE
README.md		README.md
install_deps.sh		install_deps.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

VisionInfer

Features

Requirements

General Requirements

Jetson-Specific Requirements

Installation

Install Dependencies Script Usage

Basic Usage

Compatibility Note

Install VisionInfer

For Jetson (Pre-installed System OpenCV with CUDA)

For Other Systems (No Special OpenCV)

Jetson Resource Configuration

Increase Swap Space [Optional]

Configure GPU Memory (Jetson Orin/Nano)

Pull Optimized Model (Jetson)

Quick Start

USB Camera Inference

RTSP Camera Inference

VOD (Video File) Analysis

Live Stream Analysis

Command Reference

Core Subcommands

Common Arguments

Cam Subcommand Arguments

Analyze Subcommand Arguments

Troubleshooting

Common Issues & Solutions

Cannot uninstall sympy

numpy version conflict

EOF Error During Frame Extraction

Zombie Processes (FFmpeg/Ollama)

Resource Exhaustion (Jetson)

Frame Extraction Failure

Continuous Inference Errors

Known Limitations

Jetson-Specific Limitations

General Limitations

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages