Skip to content

machinefi/VisionInfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisionInfer

Lightweight Visual Language Model (VLM) Inference Tool optimized for Jetson Edge Devices and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.

Features

  • 🎥 Multi-source support: USB cameras, RTSP streams, VOD files, live network streams
  • 🚀 Motion-gated inference (only run inference when motion detected)
  • 🎯 Frame deduplication (skip similar frames via L2 feature comparison)
  • 📊 Real-time performance monitoring (encoding/inference time, frame metrics)
  • 🔧 Jetson-optimized: Tailored for ARM64 architecture and limited edge resources
  • 🎛️ Configurable parameters: Compression quality, inference interval, motion threshold
  • 🪵 Debug mode for troubleshooting (--debug flag)

Requirements

General Requirements

  • Python 3.8+
  • OpenCV (cv2)
  • NumPy
  • psutil
  • Ollama (v0.1.40+) [Optional]
  • YOLO [Optional]
  • FFmpeg (for frame extraction from streams/files)

Jetson-Specific Requirements

  • Jetson Nano/Xavier NX/Orin (JetPack 6.0+)
  • Minimum 8GB RAM

Installation

Install Dependencies Script Usage

Our install_deps.sh script supports flexible dependency installation with optional Ollama backend, and is compatible with both sh (dash) and bash on Ubuntu/Jetson systems.

Basic Usage

Scenario Command
Install only core dependencies (ffmpeg, python3-pip, pipx) curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh | sh
Install core dependencies + Ollama backend curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh | sh -s -- --backend ollama
Show script help (check parameters) curl -fsSL https://raw.githubusercontent.com/machinefi/VisionInfer/refs/heads/main/install_deps.sh | sh -s -- --help

Compatibility Note

  • For better compatibility (especially on Jetson), you can replace sh with bash (recommended):
    # Install core dependencies + Ollama (bash execution)
    curl -fsSL https://raw.githubusercontent.com/iloveyou-github/VisionInfer/main/install_deps.sh | bash -s -- --backend ollama
    
    
    

Install VisionInfer

For Jetson (Pre-installed System OpenCV with CUDA)

To avoid breaking system dependencies (e.g., JetPack's pre-built OpenCV), use --system-site-packages to reuse the system's OpenCV:

  • If you do not plan to use YOLO models in the future, we recommend installing using the following command.
pipx install --system-site-packages vinfer
  • If you plan to use YOLO models in the future, we strongly recommend installing them with the following command.
pip install --system-site-packages vinfer
pip install ultralytics --no-deps
pip install matplotlib pillow polars psutil pyyaml requests scipy ultralytics-thop

For Other Systems (No Special OpenCV)

Install with full dependencies (includes OpenCV) if your system doesn't have a pre-configured OpenCV:

pipx install vinfer[full]

Jetson Resource Configuration

Increase Swap Space [Optional]

# Create 4GB swap file
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Make swap permanent (survive reboot) [Optional]
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

Configure GPU Memory (Jetson Orin/Nano)

# For Jetson Orin (set 16GB GPU memory)
sudo nvpmodel -m 0
sudo jetson_clocks

# For Jetson Nano (set max performance mode)
sudo nvpmodel -m 0
sudo jetson_clocks

Pull Optimized Model (Jetson)

# Recommended lightweight model for Jetson
ollama pull qwen3.5:2b

Quick Start

USB Camera Inference

# Basic USB camera (device ID 0) with debug logs
vinfer cam --usb-dev 0 --debug

# USB camera with motion detection (infer only on motion)
vinfer cam --usb-dev 0 --motion-gate --motion-threshold 500

# USB camera with frame deduplication (skip similar frames)
vinfer cam --usb-dev 0 --dedup --interval 2.0

# Basic USB camera (device ID 0) with YOLO
vinfer cam --usb-dev 0 --model "yolo"

# Basic USB camera (device ID 0) with YOLO26 in Detection task 
vinfer cam --usb-dev 0 --model "yolo" --yolo-version 26 --yolo-task "detection"

RTSP Camera Inference

# Basic RTSP stream (default credentials)
vinfer cam --rtsp-host 192.168.1.10 --rtsp-user admin --rtsp-pass password --debug

# RTSP with custom compression (320x240) and JPG quality (80)
vinfer --rtsp-host 192.168.1.10 --compress-size 320x240 --jpg-quality 80

# Simple RTSP stream (default credentials) with YOLO
vinfer -H 192.168.1.10 -m "yolo" 

# Simple RTSP stream (default credentials) with YOLO11 in Pose task
vinfer -H 192.168.1.10 -m "yolo" -yv 11 -yt "pose"

VOD (Video File) Analysis

# Local video file (analyze every 30 frames)
vinfer analyze --type vod --file /path/to/video.mp4 --start 0 --step 30

# Network VOD URL (e.g., MP4 stream)
vinfer analyze --type vod --url https://example.com/video.mp4 --debug

Live Stream Analysis

# HLS live stream (e.g., .m3u8)
vinfer analyze --type live --url https://example.com/stream.m3u8 --interval 1.0

Command Reference

Core Subcommands

Subcommand Description
cam Real-time camera inference (USB/RTSP)
analyze Offline video/live stream analysis

Common Arguments

Argument Short Description Default
--model -m Ollama model name or YOLO qwen3.5:2b
--compress-size -s Frame compression resolution (WxH) 480x360
--jpg-quality -q JPG compression quality (0-100) 70
--motion-gate -g Enable motion detection (infer only on motion) False
--motion-threshold -T Minimum motion area (pixels) 500
--dedup -D Enable frame deduplication (disabled if motion-gate is on) False
--interval -i Inference interval (seconds/frame) 1.0
--debug -d Enable verbose debug logging False
--Prompt -r User-defined prompts
--accelerate -a Accelerate reasoning speed False
--version -v Show vinfer version
--yolo-version -yv Use YOLO version [8, 11, 26] 8
--yolo-task -yt Use YOLO task ['detection', 'segment', 'classify', 'pose', 'obb'] detection

Cam Subcommand Arguments

Argument Short Description
--rtsp-host -H RTSP server IP/domain (enables RTSP mode)
--rtsp-user -U RTSP authentication username
--rtsp-pass -P RTSP authentication password
--usb-dev -u USB camera device ID (0 = /dev/video0)
--show-preview -p Start live preview window

Analyze Subcommand Arguments

Argument Short Description
--type -t Analysis type (vod/live)
--file -f Local VOD file path
--url -u Network VOD/live stream URL
--start -st Start frame number (0-based)
--step -sp Inference frame interval

Troubleshooting

Common Issues & Solutions

Cannot uninstall sympy

  • Symptom: Cannot uninstall Sympy 1.9

  • Solution:

    sudo apt remove python3-sympy -y
    

numpy version conflict

  • Symptom:numpy version conflict

  • Solution:

    • Install the specified version

      sudo pip3 install numpy==1.23.5
      

EOF Error During Frame Extraction

  • Symptom: EOFError/IOError when reading frames from RTSP/live streams
  • Solutions:
    • Increase RTSP timeout: Add -stimeout 20000000 to FFmpeg command (code already includes this)
    • Check network stability (RTSP streams require low latency)
    • Use TCP for RTSP: --rtsp-transport tcp (enabled by default in code)

Zombie Processes (FFmpeg/Ollama)

  • Symptom: Orphaned FFmpeg/Ollama processes consuming resources
  • Solutions:
    • The code includes kill_all_ffmpeg() and stop_ollama_serve() for cleanup
    • Manually kill zombie processes:
      # Kill all FFmpeg processes
      sudo pkill -f ffmpeg
      
      # Restart Ollama service
      sudo systemctl restart ollama

Resource Exhaustion (Jetson)

  • Symptom: Out of memory errors or slow inference
  • Solutions:
    • Use smaller models (qwen3.5:2b instead of 7b)
    • Increase swap space (see Installation > Jetson Configuration)
    • Reduce frame resolution (--compress-size 320x240)
    • Increase inference interval (--interval 2.0 or higher)

Frame Extraction Failure

  • Symptom: Frame extraction failed, unable to perform inference
  • Solutions:
    • Verify RTSP URL/USB device accessibility
    • Check FFmpeg installation (ffmpeg -version)
    • For RTSP: Ensure camera is online and credentials are correct

Continuous Inference Errors

  • Symptom: Continuous inference exception: [error message]
  • Solutions:
    • Enable debug mode (--debug) to see detailed error logs
    • Check Ollama service status (sudo systemctl status ollama)
    • Verify model is pulled (ollama list to check installed models)

Known Limitations

Jetson-Specific Limitations

  • Model Size: Avoid 7B+ models (e.g., qwen3.5:7b) on Jetson Nano/Xavier NX—use qwen3.5:2b for stable performance
  • Inference Speed: 2B models run at ~1-2 FPS on Jetson Orin, ~0.5 FPS on Jetson Nano
  • Preview Window: May be slow on Jetson Nano (disable with --no-preview if needed)

General Limitations

  • RTSP Latency: RTSP streams may have 1-3s latency (normal for TCP transport)
  • Frame Deduplication: May skip valid frames in low-motion scenarios (adjust DEDUP_THRESHOLD if needed)
  • Motion Detection: Sensitive to lighting changes (tune --motion-threshold for your environment)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

Lightweight Visual Language Model (VLM) Inference Tool optimized for Jetson Edge Devices and x86 platforms. Supports real-time inference for USB/RTSP cameras, VOD videos, and live streams with motion detection, frame deduplication, and efficient resource management.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors