A high-performance image stacker for DSO astrophotography, written in C/CUDA. VNG debayering, Moffat star detection, Lanczos-3 warp, and mean / kappa-sigma / median / AAWA / entropy (HDR) integration — with CUDA, Metal, and CPU backends.
Pre-built binaries available, or build from source for your exact GPU architecture.
GPU builds require the NVIDIA CUDA 12.x runtime. Any CUDA 12.x minor version works.
Debian / Ubuntu
# Install the cuda-keyring package (sets up the NVIDIA apt repository) # Replace <distro> with: ubuntu2404, ubuntu2204, debian12, etc. wget https://developer.download.nvidia.com/compute/cuda/repos/<distro>/x86_64/cuda-keyring_1.1-1_all.deb sudo dpkg -i cuda-keyring_1.1-1_all.deb sudo apt-get update # Install the runtime libraries (no compiler needed) sudo apt-get install cuda-cudart-12-9 libnpp-12-9
RHEL / Fedora
sudo dnf config-manager --add-repo \ https://developer.download.nvidia.com/compute/cuda/repos/<distro>/x86_64/cuda-<distro>.repo sudo dnf install cuda-cudart-12-9 libnpp-12-9
Download a CUDA Toolkit 12.x installer from the CUDA Toolkit Archive. During installation, select Custom and enable at minimum:
cudart)Alternatively, install silently from PowerShell after downloading the installer:
cuda_12.9.1_windows.exe -s cudart_12.9 npp_12.9 Display.Driver -n
macOS quarantines files downloaded from the internet. Since the binaries are not Apple-notarized, you need to clear the quarantine attribute before macOS will allow them to run.
mkdir -p ~/DSOStacker && curl -fL \ https://github.com/gs18113/gpu-dso-stacker/releases/latest/download/dso-stacker-gui-macos-arm64-metal.tar.gz \ | tar xz -C ~/DSOStacker \ && xattr -cr ~/DSOStacker \ && chmod +x ~/DSOStacker/DSOStacker ~/DSOStacker/_internal/bin/dso_stacker
Replace metal with cpu in the URL if you don't need Metal acceleration.
# GPU build (CUDA 12 toolkit required) cmake -B build -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc cmake --build build --parallel $(nproc) # CPU-only build (no CUDA toolkit needed) cmake -B build -DCMAKE_BUILD_TYPE=Release -DDSO_ENABLE_CUDA=OFF cmake --build build --parallel $(nproc) # Enable RAW camera file support (requires libraw-dev) cmake -B build ... -DDSO_ENABLE_LIBRAW=ON
Prerequisites: CUDA Toolkit 12.x, CFITSIO, libtiff, libpng, CMake ≥ 3.18, LibRaw (optional)
# Install dependencies via vcpkg vcpkg install cfitsio tiff libpng libraw --triplet x64-windows # Configure cmake -B build -G "Visual Studio 17 2022" -A x64 ` -DCMAKE_TOOLCHAIN_FILE="$env:VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake" ` -DCMAKE_CUDA_COMPILER="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.6/bin/nvcc.exe" # Build cmake --build build --config Release --parallel
Prerequisites: Visual Studio 2022, CUDA Toolkit 12.x, vcpkg
# Apple Silicon — Metal scaffold backend cmake -B build -DCMAKE_BUILD_TYPE=Release \ -DDSO_ENABLE_CUDA=OFF \ -DDSO_ENABLE_METAL=ON cmake --build build --parallel # Select Metal backend at runtime ./build/dso_stacker -f frames.csv -o stacked.fits --backend metal
From raw FITS or camera RAW files to a finished stack — debayering, alignment, calibration, and integration in one pipeline.
Every compute-heavy stage runs on CUDA — VNG debayer, Moffat convolution, Lanczos warp, and kappa-sigma / median / AAWA / entropy integration use double-buffered stream overlap for maximum GPU utilization.
Moffat PSF convolution with adaptive sigma threshold detects stars per frame. Optional Levenberg-Marquardt Gaussian centroid fitting refines positions to ~0.01-0.05 pixel accuracy. Triangle-matching + RANSAC computes alignment transforms — auto-selecting from projective, bilinear, bisquared, or bicubic models based on star density for optimal field-curvature correction. Per-frame quality scoring (FWHM, roundness, star count) with optional automatic rejection of poor sub-frames.
Load CR2, NEF, ARW, DNG, and 12 other RAW formats directly via LibRaw. Raw Bayer mosaic extraction with per-channel black subtraction. Build with -DDSO_ENABLE_LIBRAW=ON.
Bayer pattern auto-detection from FITS BAYERPAT keyword or RAW metadata. VNG demosaic produces separate R, G, B planes; all warp and integration stages run per-channel.
Sub-pixel dithering recovery via Fruchter & Hook drizzle algorithm. 2× or 3× output resolution with configurable drop fraction. Bayer Drizzle operates on raw CFA data for artifact-free full-color super-resolution.
Dark, flat, bias, and darkflat master generation via winsorized mean, median, or kappa-sigma stacking. Applied before debayering. Dead-pixel guard and flat normalization built-in.
FITS (FP32), TIFF (FP32/FP16/INT16/INT8 + none/zip/lzw/rle), PNG (8/16-bit). Format detected from file extension.
Full OpenMP-parallelized CPU path via --cpu. Metal backend scaffolded for Apple Silicon. CPU-only builds require no NVIDIA GPU or CUDA runtime.
Camera, auto (gray-world), or manual per-channel white balance applied to raw Bayer mosaic before demosaicing for accurate color rendering.
Per-channel or RGB background calibration normalizes sky brightness across frames before stacking. Essential for sessions with varying sky conditions.
A PySide6 desktop app wrapping the CLI. Drag-and-drop frame management, all stacking options, and YAML project save/load.
Pre-built GUI bundles are available on the Releases page for Linux, Windows, and macOS. No Python installation required — just download, extract, and run.
If you built the CLI from source, you can run the GUI directly with Python:
# Install Python deps pip install PySide6 pyyaml # Launch (expects ./build/dso_stacker to exist) python src/GUI/main.py
10 × 4656×3520 frames, star-detection mode. RTX 30/40-series GPU.
Point it at a 2-column CSV frame list and choose your options.
filepath, is_reference /data/frame1.fits, 1 /data/frame2.fits, 0 /data/frame3.fits, 0
GPU stack (default)
dso_stacker -f frames.csv -o stacked.fits
CPU-only
dso_stacker -f frames.csv -o stacked.fits --cpu
Color OSC camera (RGGB sensor)
dso_stacker -f frames.csv -o stacked.fits --bayer rggb --kappa 2.5 --iterations 5
With calibration frames
dso_stacker -f frames.csv -o stacked.fits \
--bias bias_frames.txt \
--dark dark_frames.txt \
--flat flat_frames.txt \
--save-master-frames ./masters
16-bit TIFF with ZIP compression
dso_stacker -f frames.csv -o stacked.tiff --bit-depth 16 --tiff-compression zip
| Extension | Format | Bit depths | Compression |
|---|---|---|---|
.fits .fit .fts | FITS | f32 (always) | none |
.tif .tiff | TIFF | f32, f16, 16, 8 | none / zip / lzw / rle |
.png | PNG | 16, 8 | lossless DEFLATE |
Six stages, two execution paths. Pass --cpu to run everything with OpenMP instead of CUDA.
| # | Stage | GPU path (default) | CPU path (--cpu) |
|---|---|---|---|
| 1 | Debayering | VNG demosaic → luminance CUDA | VNG demosaic → luminance OpenMP |
| 2 | Star Detection | Moffat PSF conv + σ threshold CUDA | Moffat PSF conv + σ threshold OpenMP |
| 2b | Centroid Refinement (optional) | LM Gaussian fitting CUDA | LM Gaussian fitting OpenMP |
| 3 | Alignment | Triangle matching + DLT CPU CUDA | Triangle matching + DLT CPU |
| 4 | Debayering (warp) | VNG → lum or R/G/B CUDA | VNG → lum or R/G/B OpenMP |
| 5 | Lanczos-3 Warp | nppiRemap + coord-map kernel CUDA |
6-tap backward-map warp OpenMP |
| 6 | Integration | Mini-batch mean / κ-σ / median / AAWA / entropy CUDA | Mean / κ-σ / median / AAWA / entropy OpenMP |
370+ tests across 21 suites. GPU tests auto-skip (exit 77) when no CUDA device is found.
| Suite | Tests | Coverage |
|---|---|---|
test_cpu | 51 | CSV parser, FITS I/O, integration (mean, kappa-sigma, median, AAWA, entropy), Lanczos CPU |
test_gpu | 5 | GPU Lanczos |
test_star_detect | 31 | CCL + CoM, Moffat conv + threshold |
test_centroid_lm | 9 | LM Gaussian centroid fitting (CPU) |
test_frame_quality | 9 | FWHM, roundness, background, composite scoring |
test_ransac | 23 | DLT homography, triangle matching, RANSAC |
test_transform | 16 | Polynomial transform eval, fit, auto-select |
test_debayer_cpu | 16 | VNG debayer: all 4 Bayer patterns + edge cases |
test_integration_gpu | 11 | GPU mini-batch kappa-sigma, median, AAWA, entropy |
test_calibration | 34 | Dark/flat apply, masters, winsorized mean, median, kappa-sigma |
test_color | 33 | Color output, fits_save_rgb, Bayer detection |
test_white_balance | 16 | Bayer color LUT, wb_apply_bayer, wb_auto_compute |
test_image_io | 26 | Format detection, FITS, TIFF, PNG, auto-stretch |
test_raw_io | 11 | Extension detection, FITS fallback, RAW dispatch |
test_background | 11 | bg_compute_stats, bg_normalize_cpu |
test_drizzle | 10 | Drizzle init, identity 2x, subpixel shift, Bayer channels |
test_audit | 3 | Integration stability, CCL large-frame, Lanczos baseline |
test_pipeline_cpu | 21 | CPU pipeline end-to-end, calibration, color, drizzle |
test_pipeline_backend | 8 | Backend dispatch, selection logic |
test_numerical | 16 | Numerical precision, edge cases |
test_cross_stage | 11 | Cross-stage integration tests |
cd build && ctest --output-on-failure -V