if (thread_index + 1 == RBF_MAX_THREADS)
height_segment = height - thread_index * height_segment;
should be changed to:
if (thread_index + 1 == RBF_MAX_THREADS)
height_segment = height - thread_index * height_segment - 1;
Please note that I removed the variable for number of threads in the class for my implementation.