I have found out that apparently some apps rely on the fact that code like fseek(f, 0, SEEK_CUR); and even fseek(f, <current_offset>, SEEK_SET); doesn't take any cycles.
In fseek I was able to hack and verify an early return of SEEK_CUR but handling SEEK_SET seems to be above my paycheck. I'd assume that comparing offset and stream->__offset should be enough but apparently isn't as the performance hit was still present.