Immich + cn-clip + RapidOCR + InsightFace
~~The plan was to migrate to ente-io/ente, since I need s3 to store photos~~
But ente still has too few features
Changed to using juicedata/juicefs to mount s3
Project Overview
This project is an AI capability enhancement solution for the Immich photo management system. It mainly extends Immich’s native features with the following components:
- inference-gateway: A gateway service written in Go, responsible for intelligently routing Immich's machine learning requests
- mt-photos-ai: An AI service based on Python and FastAPI, integrating RapidOCR and the cn-clip model
- Immich functional extensions, including OCR text recognition search and single-media AI data reprocessing, hybrid ranking with OCR full-text vectors and CLIP vector scoring
- Adding zhparser Chinese word segmentation to PostgreSQL
Main Features
1. OCR Text Recognition and Search
- Use RapidOCR to recognize text in images
- Support for mixed Chinese and English text recognition
- Implement search functionality based on image text content
2. CLIP Image Vector Processing
- Use the cn-clip model for more accurate Chinese image-text matching
- Support semantic search to improve search accuracy
3. Single Media AI Data Reprocessing
- Support regenerating OCR data for a single image/video
- Support regenerating CLIP vector data for a single image/video
- Provide manual refresh capability for inaccurate recognition results
System Architecture
┌─────────────┐ ┌──────────────────┐ ┌───────────────┐
│ │ │ │ │ │
│ Immich │─────▶│ inference-gateway│─────▶│ Immich ML │
│ Server │ │ (Go网关) │ │ Server │
│ │ │ │ │ │
└─────────────┘ └──────────────────┘ └───────────────┘
│
│ OCR/CLIP请求
▼
┌──────────────────┐
│ │
│ mt-photos-ai │
│ (Python服务) │
│ │
└──────────────────┘
Component Details
inference-gateway
A gateway service written in Go, with main responsibilities:
- Receiving machine learning requests from Immich
- Forwarding OCR and CLIP requests to the mt-photos-ai service based on request type
- Forwarding other machine learning requests (such as face recognition) to Immich's native machine learning service
- Handling authentication and data format conversion
mt-photos-ai
An AI service written in Python and FastAPI, providing:
- OCR text recognition API (based on RapidOCR)
- CLIP vector processing API (based on cn-clip)
- Supports GPU acceleration
Deployment Instructions
Environment Requirements
- Docker and Docker Compose
- NVIDIA GPU (optional, but recommended for accelerated processing)
- Sufficient storage space
Configuration Instructions
- inference-gateway Configuration
IMMICH_API=http://localhost:3003 # Immich API地址
MT_PHOTOS_API=http://localhost:8060 # mt-photos-ai服务地址
MT_PHOTOS_API_KEY=mt_photos_ai_extra # API密钥
PORT=8080 # 网关监听端口
- mt-photos-ai Configuration
CLIP_MODEL=ViT-B-16 # CLIP模型名称
CLIP_DOWNLOAD_ROOT=./models/clip # 模型下载路径
DEVICE=cuda # 或 cpu,推理设备
HTTP_PORT=8060 # 服务监听端口
Deployment Steps
- Clone the repository:
git clone https://github.com/你的用户名/immich-all-in-one.git
cd immich-all-in-one
- Start the service:
docker-compose up -d
Instructions
- Configure Immich to Use a Custom ML Service
MACHINE_LEARNING_URL=http://inference-gateway:8080
- OCR Search Usage
- Use the
ocr:prefix in the Immich search bar to perform OCR searches - For example:
ocr:invoicewill search for photos containing the word "invoice" in the image - Single Media AI Data Reprocessing
- On the photo details page, click the menu options
- Select "Regenerate OCR Data" or "Regenerate CLIP Vector"
- The system will reprocess the AI data for that photo
Developer Guide
inference-gateway (Go)
Compile and run:
cd inference-gateway
go build
./inference-gateway
mt-photos-ai (Python)
Development environment setup:
cd mt-photos-ai
pip install -r requirements.txt
python -m app.main
License
This project is open-sourced under the MIT License.
Acknowledgements
- Immich - Open-source self-hosted photo and video backup solution
- RapidOCR - Cross-platform OCR library based on PaddleOCR
- cn-clip - Chinese multimodal contrastive learning pre-trained model