-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Strip overstriking before applying syntax highlighting to better support man pages #3517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
keith-hall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. If we do decide to merge this, probably we would no longer need the complicated MANPAGER advice in the readme? CC @eth-p as our expert on man pages and the author of #2858
I think it could benefit from some integration tests for content containing overstrike, in binary as text mode and normal mode.
d49d8d8 to
4f7c71a
Compare
|
I have added some integration tests which showed me that a safer approach would be (now implemented) to only strip the overstriking when syntax highlighting will be done (which actually more closely matches my original idea). |
4f7c71a to
27dfb40
Compare
keith-hall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general the implementation looks clean to me. I have a couple of minor comments, and I still think we should remove the complicated MANPAGER awk script from the readme now we have strip_overstrike enabled by default.
Could we also have a test to prove that --show-all still shows backspace characters as expected, or is it already covered by existing tests? 🙂
| } | ||
| }; | ||
|
|
||
| if self.strip_overstrike && line.contains('\x08') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how necessary the line.contains check is, as the first thing strip_overstrike does is perform a find - essentially double work? Would it be worth benchmarking with vs without this check to see which is really better for performance? Or to consider doing a find here and passing the position of it into strip_overstrike? Maybe an unnecessary micro-optimization, but I'm a bit worried because this will run on every line of input passing through bat...
| ## Bugfixes | ||
|
|
||
| - Strip overstriking before applying syntax highlighting to better support man pages, see #3517 (@akirk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue that it is more of a feature than a bugfix... Maybe doesn't matter too much, but I would be interested to hear other opinions 🙂
| }; | ||
|
|
||
| // Strip overstrike only when we have syntax highlighting (not plain text). | ||
| let strip_overstrike = !is_plain_text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how necessary it is on other inputs apart from the manpages. I guess it shouldn't cause any confusing output when syntax highlighting is enabled, so maybe this approach is fine 👍
Potentially fixes #652, #2479, #2568, #3025, #3053
I noticed that running
git commit --help | bat -plhelpdisplayed text with visible^Hcharacters and doubled letters. I realized that this happens because man pages use something called overstriking where (similar to a typewriter) a character is typed twice over each other.batwas styling the page and passing through the^Hwhich lead to this mixed output.So as a solution I introduced a
strip_overstrike()function to simply remove the backspace formatting to allow syntax highlighters to modify the output undisturbed and applies this stripping in bothSimplePrinterandInteractivePrinter.Cow<str>is used to avoid allocation when no backspaces are present.I am not an overly heavy user of
batbut in my testing of this fix, I tried to verify that:MANPAGER="bat -p -l man" man lsdisplays cleanly.git commit --help | bat -plhelpnow works correctly.--binary=as-textstill preserves raw bytes.I am not sure if there are other scenarios that I missed where stripping the overstriking might be unintentional.
After all, the above renders like this with this patch:
On as side note: I realized that
git commit --help | bat -plhelpdisplays the text not full-width but this is coming fromgit commit --helpwhen it detects that its output is being piped. One workaround is to runMANWIDTH=$(tput cols) git commit --help | bat -plhelp