Figure 1. The example of input images. The red area represents an empty region. This application can combine these images while considering their rotation.
Figure 2. The preprocessed input images. This rotation process is necessary to accurately combine the images. The green frame represents the overlap region between the input images.
This application is designed based on the overlap region's width $w_c$ and height $h_c$. Thanks to this idea, we can simply limit the search space, thus preventing it from capturing overly small, suboptimal overlap region.
However, the above approach is not always applicable, specifically when $\min(h_1, h_2) < h_c$ or $\min(w_1, w_2) < w_c$. To address this issue, I change the perspective of $w_c$ and $h_c$ like the above figure. Therefore, this application can handle images of arbitrary sizes.