Differential cryptanalysis of block ciphers requires the identification of differential characteristics with high probability. For block ciphers that have a large block size and a large number of rounds, identifying these differential trails is a computationally intensive task. Matsui first proposed a branch-and-bound algorithm to search for differential trails in 1994. There have been numerous improvements made to the branch-and-bound algorithm since then, such as improving its efficiency by bounding the number of active s-boxes, incorporating a meet-in-the-middle approach, and adapting it to different block cipher architectures like ARX. Although mixed-integer linear programming (MILP) technique has been widely used recently to evaluate the differential resistance of block ciphers, MILP is still an inefficient technique for clustering differential trails (also known as the differential effect). The branch-and-bound method is still a tool better suited for the task of trail clustering. However, it still requires enhancements before being feasible for block ciphers with large block sizes, especially for a large number of rounds. Motivated by the need for a more efficient branch-and-bound algorithm to search for block cipher differential clusters, we propose a GPU-accelerated branch-and-bound approach. The proposed GPU-accelerated algorithm substantially increases the performance of the differential cluster search. We were able to derive a branch enumeration and evaluation kernel that is 5.95 times faster than its CPU counterpart. Then to showcase the practicality of the proposed approach, it is applied on TRIFLE-BC, a 128-bit block cipher. By utilizing the proposed GPU kernel together the incorporation of a meet-in-the-middle approach, we were able to improve the performance of the algorithm by approximately 60 times the original recursive algorithm based on a 20-round TRIFLE-BC. Also, differential clusters with sizes of approximately 2 million for 43 rounds were constructed, leading to slight improvements to the overall differential probabilities. This result depicts the practicality of the proposed GPU framework in constructing clusters consisting of millions of differential trails, which could be used to improve cryptanalytic findings against other block ciphers in the future.