Lightweight Visual Language Model (VLM) Inference Tool optimized for Jetson Edge Devices and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.
- 🎥 Multi-source support: USB cameras, RTSP streams, VOD files, live network streams
- 🚀 Motion-gated inference (only run inference when motion detected)
- 🎯 Frame deduplication (skip similar frames via L2 feature comparison)
- 📊 Real-time performance monitoring (encoding/inference time, frame metrics)
- 🔧 Jetson-optimized: Tailored for ARM64 architecture and limited edge resources
- 🎛️ Configurable parameters: Compression quality, inference interval, motion threshold
- 🪵 Debug mode for troubleshooting (--debug flag)
- Python 3.8+
- OpenCV (cv2)
- NumPy
- psutil
- Ollama (v0.1.40+) [Optional]
- YOLO [Optional]
- FFmpeg (for frame extraction from streams/files)
- Jetson Nano/Xavier NX/Orin (JetPack 6.0+)
- Minimum 8GB RAM
Our install_deps.sh script supports flexible dependency installation with optional Ollama backend, and is compatible with both sh (dash) and bash on Ubuntu/Jetson systems.
| Scenario | Command |
|---|---|
| Install only core dependencies (ffmpeg, python3-pip, pipx) | curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh | sh |
| Install core dependencies + Ollama backend | curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh | sh -s -- --backend ollama |
| Show script help (check parameters) | curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh | sh -s -- --help |
- For better compatibility (especially on Jetson), you can replace
shwithbash(recommended):# Install core dependencies + Ollama (bash execution) curl -fsSL https://raw.githubusercontent.com/iloveyou-github/VisionInfer/main/install_deps.sh | bash -s -- --backend ollama
To avoid breaking system dependencies (e.g., JetPack's pre-built OpenCV), use --system-site-packages to reuse the system's OpenCV:
- If you do not plan to use YOLO models in the future, we recommend installing using the following command.
pipx install --system-site-packages vinfer- If you plan to use YOLO models in the future, we strongly recommend installing them with the following command.
pip install --system-site-packages vinfer
pip install ultralytics --no-deps
pip install matplotlib pillow polars psutil pyyaml requests scipy ultralytics-thop
Install with full dependencies (includes OpenCV) if your system doesn't have a pre-configured OpenCV:
pipx install vinfer[full]# Create 4GB swap file
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
# Make swap permanent (survive reboot) [Optional]
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab# For Jetson Orin (set 16GB GPU memory)
sudo nvpmodel -m 0
sudo jetson_clocks
# For Jetson Nano (set max performance mode)
sudo nvpmodel -m 0
sudo jetson_clocks# Recommended lightweight model for Jetson
ollama pull qwen3.5:2b# Basic USB camera (device ID 0) with debug logs
vinfer cam --usb-dev 0 --debug
# USB camera with motion detection (infer only on motion)
vinfer cam --usb-dev 0 --motion-gate --motion-threshold 500
# USB camera with frame deduplication (skip similar frames)
vinfer cam --usb-dev 0 --dedup --interval 2.0
# Basic USB camera (device ID 0) with YOLO
vinfer cam --usb-dev 0 --model "yolo"
# Basic USB camera (device ID 0) with YOLO26 in Detection task
vinfer cam --usb-dev 0 --model "yolo" --yolo-version 26 --yolo-task "detection"# Basic RTSP stream (default credentials)
vinfer cam --rtsp-host 192.168.1.10 --rtsp-user admin --rtsp-pass password --debug
# RTSP with custom compression (320x240) and JPG quality (80)
vinfer --rtsp-host 192.168.1.10 --compress-size 320x240 --jpg-quality 80
# Simple RTSP stream (default credentials) with YOLO
vinfer -H 192.168.1.10 -m "yolo"
# Simple RTSP stream (default credentials) with YOLO11 in Pose task
vinfer -H 192.168.1.10 -m "yolo" -yv 11 -yt "pose"# Local video file (analyze every 30 frames)
vinfer analyze --type vod --file /path/to/video.mp4 --start 0 --step 30
# Network VOD URL (e.g., MP4 stream)
vinfer analyze --type vod --url https://example.com/video.mp4 --debug# HLS live stream (e.g., .m3u8)
vinfer analyze --type live --url https://example.com/stream.m3u8 --interval 1.0| Subcommand | Description |
|---|---|
cam |
Real-time camera inference (USB/RTSP) |
analyze |
Offline video/live stream analysis |
| Argument | Short | Description | Default |
|---|---|---|---|
--model |
-m |
Ollama model name or YOLO | qwen3.5:2b |
--compress-size |
-s |
Frame compression resolution (WxH) | 480x360 |
--jpg-quality |
-q |
JPG compression quality (0-100) | 70 |
--motion-gate |
-g |
Enable motion detection (infer only on motion) | False |
--motion-threshold |
-T |
Minimum motion area (pixels) | 500 |
--dedup |
-D |
Enable frame deduplication (disabled if motion-gate is on) | False |
--interval |
-i |
Inference interval (seconds/frame) | 1.0 |
--debug |
-d |
Enable verbose debug logging | False |
--Prompt |
-r |
User-defined prompts | |
--accelerate |
-a |
Accelerate reasoning speed | False |
--version |
-v |
Show vinfer version | |
| --yolo-version | -yv |
Use YOLO version [8, 11, 26] | 8 |
| --yolo-task | -yt |
Use YOLO task ['detection', 'segment', 'classify', 'pose', 'obb'] | detection |
| Argument | Short | Description |
|---|---|---|
--rtsp-host |
-H |
RTSP server IP/domain (enables RTSP mode) |
--rtsp-user |
-U |
RTSP authentication username |
--rtsp-pass |
-P |
RTSP authentication password |
--usb-dev |
-u |
USB camera device ID (0 = /dev/video0) |
--show-preview |
-p |
Start live preview window |
| Argument | Short | Description |
|---|---|---|
--type |
-t |
Analysis type (vod/live) |
--file |
-f |
Local VOD file path |
--url |
-u |
Network VOD/live stream URL |
--start |
-st |
Start frame number (0-based) |
--step |
-sp |
Inference frame interval |
-
Symptom: Cannot uninstall Sympy 1.9
-
Solution:
sudo apt remove python3-sympy -y
-
Symptom:numpy version conflict
-
Solution:
-
Install the specified version
sudo pip3 install numpy==1.23.5
-
- Symptom:
EOFError/IOErrorwhen reading frames from RTSP/live streams - Solutions:
- Increase RTSP timeout: Add
-stimeout 20000000to FFmpeg command (code already includes this) - Check network stability (RTSP streams require low latency)
- Use TCP for RTSP:
--rtsp-transport tcp(enabled by default in code)
- Increase RTSP timeout: Add
- Symptom: Orphaned FFmpeg/Ollama processes consuming resources
- Solutions:
- The code includes
kill_all_ffmpeg()andstop_ollama_serve()for cleanup - Manually kill zombie processes:
# Kill all FFmpeg processes sudo pkill -f ffmpeg # Restart Ollama service sudo systemctl restart ollama
- The code includes
- Symptom:
Out of memoryerrors or slow inference - Solutions:
- Use smaller models (qwen3.5:2b instead of 7b)
- Increase swap space (see Installation > Jetson Configuration)
- Reduce frame resolution (
--compress-size 320x240) - Increase inference interval (
--interval 2.0or higher)
- Symptom:
Frame extraction failed, unable to perform inference - Solutions:
- Verify RTSP URL/USB device accessibility
- Check FFmpeg installation (
ffmpeg -version) - For RTSP: Ensure camera is online and credentials are correct
- Symptom:
Continuous inference exception: [error message] - Solutions:
- Enable debug mode (
--debug) to see detailed error logs - Check Ollama service status (
sudo systemctl status ollama) - Verify model is pulled (
ollama listto check installed models)
- Enable debug mode (
- Model Size: Avoid 7B+ models (e.g., qwen3.5:7b) on Jetson Nano/Xavier NX—use
qwen3.5:2bfor stable performance - Inference Speed: 2B models run at ~1-2 FPS on Jetson Orin, ~0.5 FPS on Jetson Nano
- Preview Window: May be slow on Jetson Nano (disable with
--no-previewif needed)
- RTSP Latency: RTSP streams may have 1-3s latency (normal for TCP transport)
- Frame Deduplication: May skip valid frames in low-motion scenarios (adjust
DEDUP_THRESHOLDif needed) - Motion Detection: Sensitive to lighting changes (tune
--motion-thresholdfor your environment)
This project is licensed under the MIT License - see the LICENSE file for details.
- [YOLO](Ultralytics | Revolutionizing the World of Computer Vision) for end-to-end computer vision platform
- Ollama for lightweight LLM inference
- OpenCV for computer vision processing
- NVIDIA Jetson for edge AI platform support