- Fix FaceLandmarkerResult unpacking error by returning tuple (result, None)
when no face is detected in detect() and detect_for_video() methods
- Fix insightface FaceAnalysis.get() invalid parameter by setting det_thresh
via det_model attribute instead of passing it as argument
- Add mesh3d None check to properly trigger insightface fallback detection
- Add 'dreamidv_wan_faster' module for accelerated inference
- Support 'origin' and 'faster' pipeline types in RunningHub_DreamID_V_Loader
- Add 'face_detection_threshold' parameter to control face detection sensitivity
- Improve video sampling with skip-frame handling and FFmpeg-based audio sync
- Enhance face landmark alignment and video data extraction logic
Root cause: We modified LMKExtractor to accept custom detection thresholds
(min_detection_confidence=0.3), but original code used MediaPipe defaults.
This change in thresholds caused different detection results, leading to
quality degradation especially with longer videos (241 frames vs 81 frames).
Fix:
- Restore original LMKExtractor with no custom parameters (uses MediaPipe defaults)
- Set use_insightface=False by default to use original detection behavior
- Keep hybrid detector available but not enabled by default
The hybrid detector was cropping face regions before MediaPipe detection,
then attempting to map coordinates back. However, only 2D landmarks (lmks)
were being transformed - lmks3d and trans_mat were not adjusted.
This caused incorrect pose/mask generation leading to face distortion.
Fix: Always run MediaPipe on the FULL image (like original code).
InsightFace is now only used to verify face presence when MediaPipe fails.
The root cause of video quality degradation was re-encoding:
- prehandle_video was re-encoding the video using imageio.get_writer
- Video re-encoding causes quality loss, especially noticeable with more frames
- 241 frames showed worse degradation than 81 frames due to accumulated encoding artifacts
Fix:
- prehandle_video now only performs face detection, no video re-encoding
- Use original video directly as ref_video_path
- This matches the original project behavior
- Changed prehandle_video to keep all frames instead of filtering
- For frames without detected faces, use previous frame's result (interpolation)
- This ensures ref_video, mask, pose all have consistent frame counts
- Fixes video quality degradation issue when frame_num > actual detected faces
- Removed frame insertion logic that was breaking video temporal coherence
- Filter out invalid bboxes (negative coords, out of bounds)
- Filter out too small faces (< 30px)
- Sort candidates by score * area for better selection
- Try top 3 candidates instead of just largest
- Fall back gracefully when MediaPipe fails on cropped face
- Remove use_insightface parameter from node inputs
- Set use_insightface=True as default, no user toggle needed
- Simplify user experience with automatic best detection method
- Auto-detect InsightFace models from {ComfyUI}/models/insightface
- Support offline usage without downloading models from GitHub
- Add model_name parameter (default: buffalo_l)
- Check multiple possible paths for model location
- Add HybridLMKExtractor that uses InsightFace for robust face detection
and MediaPipe for detailed landmark extraction
- Add 'use_insightface' boolean parameter to Sampler nodes (default: false)
- InsightFace provides better detection on challenging videos with
side profiles, motion blur, or occlusion
- Falls back to MediaPipe-only mode if InsightFace is not installed
- Add insightface and onnxruntime as optional dependencies in requirements.txt
- Fix FaceLandmarker return type inconsistency (return tuple instead of single object when no face detected)
- Add mesh3d None check in LMKExtractor to handle empty detections
- Add debug logging for face detection failures
- Store face_results in prehandle_video to avoid re-detection inconsistency
- Add face_detection_threshold parameter to Sampler nodes (default 0.5, range 0.1-1.0)
- Lower threshold allows more face detections on challenging videos
- Pass pre-computed face_results to generate_pose_and_mask_videos to ensure frame count consistency
- Add 'custom' option to size selector with custom_width/custom_height inputs
- Add portrait resolution presets (480*832, 720*1280)
- Add VIDEO output type that creates video with audio from source
- Auto-detect and copy audio track from input video using ffmpeg
- Use VideoFromFile for ComfyUI native VIDEO object support
Changed T5EncoderModel.__init__ default device parameter from
torch.cuda.current_device() to None, with runtime resolution.
This allows module import and node registration on systems without
NVIDIA GPU while maintaining full functionality when GPU is available.