-
Notifications
You must be signed in to change notification settings - Fork 62
feat: add DWARF debugging support and source mapping infrastructure #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
84c83e6 to
3a43ba5
Compare
|
If we combine with the function demangling support in #937 we should be in a really nice spot. |
| // debug info: SMAP | ||
| com/dylibso/chicory/$gen/CompiledMachine.java | ||
| WASM | ||
| *S WASM | ||
| *F |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically SMAPs are put in the debug info section of the class file. At least this is what JSPs do. Sadly I don't know of an easy way to get this value back out at runtime. So I'm putting it in the class a 2nd time in an SMAP field. Does anyone one know of an easy way to get it back out? I assume we don't want to require asm in the runtime path. And we need to get it back so we can enhance the exception stack traces.
Given I'm not sure IDEs will be able to deal with our SMAPs in the debug info. Should I omit putting in the debug info? Or should I leave it in for a bit while we test out what IDEs can do with it?
| .withMachineFactory( | ||
| MachineFactoryCompiler.builder(module) | ||
| .withDebugParser(DebugParser::parse) | ||
| .compile()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wish there was a way we could avoid having to repeat options like withDebugParser at the instance and compiler level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like something that might go on the Store? 🤔
|
the dwarf-rust module uses the https://github.com/chirino/wasm-source-map rust module. It's a tight coupling, should we inline the source similar to the wasm-corpus stuff? |
|
so from a cursory look, basically we are now extracting DWARF metadata into SMAP (they live!) and using that? pretty cool |
that's exactly what's happening. |
|
And now we extract the function names from the dwarf info. |
andreaTP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly skimmed through the code and it looks pretty cool.
Is always great to use Chicory to advance Chicory, neat!
The biggest "missing piece" imho is the fact that we rely on the original wasm module, while, to tackle real-world use-cases, we should load debugging information from arbitrary modules.
Usually ppl strip the debugging info for production, but is possible to emit an equivalent module containing those.
I added a few comments around, but I believe that we can try to scope down this PR so that it only contains the APIs changes needed to provide the functionality and we provide the actual debugger in an experimental module or we ship it (initially) from another org (e.g. roastedroot).
| @@ -0,0 +1,2403 @@ | |||
| // class version 55.0 (55) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approval tests are high maintenance as they often cause conflicts and require bulk updates.
Ideally they are intended to be used to have a reference in the codebase that a human can read and reason about, the smallest the better.
In this case the golden master is:
- big
- non human readable for most part
There are a few things that can be done to improve the situation, but the question is:
what are we really testing with this approval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It lets developers see the data formats that are being:
- sent from wasm-source-map to the DebugParser class
- stored int the compiled compiled classes
I don't think there is another place to see those formats.
Additionally we want to make sure that those formats don't change with each run.
If those goals are not important I'll remove the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the goals are important, they just don't fit an approval tests in the current implementation 🙂
Either:
- use a very very small example
- check the presence of specific elements other than the full content
does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take them out. Maybe you can help me figure out how to do it better.
|
|
||
| public class InstanceTest { | ||
|
|
||
| // private InputStream getResource(String wasmResource) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftovers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted.
dwarf-rust/src/main/wasm/README.md
Outdated
| @@ -0,0 +1,3 @@ | |||
| # Attribution | |||
|
|
|||
| wasm-source-map.wasm was built from https://github.com/chirino/wasm-source-map/tree/eb37f99319379950b2e50c21d4a90d657becfd5b | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the repo and the license is BSD 3, is it compatible to include this compiled artifact in an Apache project?
Additionally, in the external repo there is not a lot of code and is mostly tested here, I propose:
- preferred - is it possible to keep the rust library and the Java bindings in an external repo? (e.g. roastedroot)
- alternatively - have some basic testing and build everything here in an
experimentalmodule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the ASF allows BSD and MIT code to back packaged with it's projects, so It's safe.
I like the idea of moving it into experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the module need to be renamed and the rust source added?
dwarf-rust/pom.xml
Outdated
| </dependency> | ||
| <dependency> | ||
| <groupId>com.google.code.gson</groupId> | ||
| <artifactId>gson</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for using Gson here?
We usually, use Jackson as it plays well with the rest of the ecosystem (Quarkus etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no reason, just simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'd stay on the safe side with Jackson if possible(it's a 4 locs change), ideally we should avoid any external dependency when possible (and it might not be easily possible in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
| // ********************************************************************* | ||
| // For testing (and as an example of use)... | ||
|
|
||
| public static void main(String[] args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm not a big fan of leaving main methods around ... is it any useful? Maybe move it to a test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can drop it. it was part of the original source file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it reminds me too much the time spent debugging Hadoop and similar 😅
runtime/src/main/java/com/dylibso/chicory/runtime/internal/smap/Stratum.java
Outdated
Show resolved
Hide resolved
runtime/src/test/java/com/dylibso/chicory/runtime/internal/smap/SmapParserTest.java
Outdated
Show resolved
Hide resolved
| var address = te.getCallStackAddresses()[i]; | ||
| var funcId = te.getCallStackFunctionIds()[i]; | ||
|
|
||
| var functionName = stratum.getFunctionMapping(address - codeSectionAddress); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a little Markdown table somewhere(even in a comment) that represents those hidden informations(completely made up):
| info | dwarf address | smap address |
|---|---|---|
| function id | .debug_info + byte offset in section | ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some more docs there.. not sure exactly what your looking for tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to grasp what is connected to what, e.g. code section instruction number 9 -> smap position XYZ -> stratum info at etc. etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the comment I added help?
|
|
||
| validator.validateSectionType(sectionId); | ||
|
|
||
| var sectionAddress = buffer.position(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already doing it in a few places, but, I think that buffer.position is not a great "source of truth", I think we need a slightly more principled approach.
Additionally the module with the debug information can be separate, and we should account for this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why you think this is a bad source of truth.
buffer.position() seems to give us the current address relative to the start of the wasm module which is exactly the information that I need.
This should work for modules that the debug information has be separated. The DebugParser does not need the code address. it only parses the debug sections and what it produces is a Stratum with addresses that are relative to the code section. So if that code section gets moved or placed in a different module, the Stratum is still valid since the addresses are relative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, so the code address is needed to calculate the offset of the module used during runtime, and that's why we are adding the information to the wasm data structure, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. At runtime for the interpreter when it maps instruction addresses to source lines.
At compile time for the compiler.
| // Read the example.smap file from test resources | ||
| try (var inputStream = getClass().getResourceAsStream("/example.smap")) { | ||
| assertNotNull(inputStream, "example.smap resource not found"); | ||
| String actual = new String(inputStream.readAllBytes(), StandardCharsets.UTF_8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I'm banging my head trying to figure out why this fails on windows and then I remember that git on windows checks out text files with different line ending! We always generate with "\n". I think I need to check this in as a binary file.
andreaTP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement!
I think we should agree on a few points before moving on with this:
Stratumis currently in theinternalpackage, and should not spill to the public API, would it be possible to extract a "generic data structure" that can be eventually used byStratum?- in this proposal compiler and interpreter are using different mechanisms to enable debugging, I think we should make an effort to keep the API aligned
- what happens when the dwarf mechanism fails?
- all in all, I'd still prefer to start shipping from an external repository, but this PR should go on with the core changes to enable the feature
| Instance instance = Instance.builder(module) | ||
| .withMachineFactory( | ||
| MachineFactoryCompiler.builder(module) | ||
| .withDebugParser(DebugParser::parse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use a uniform API across compiler and interpreter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love that too but I'm not sure how to accomplish it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of ideas, but maybe I'm not aware of all the invariants, let me know if this point gets stuck and I'll have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into, what do you think about moving the debugParser to be an optional field of WasmModule?
In this way it can be easily applied either in the Instance and in any Machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use target folders for the ephemeral part of the build, I'd cache this file in the root of dwarf-rust-experimental, but I understand this might be more of a personal preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make the build script copy it out to the Java side of the source tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could make the build script copy it out to the Java side of the source tree.
| // column: u32, | ||
| } | ||
|
|
||
| // enum FuncState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean commented code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
| // If we have all the information, add it to our functions map | ||
| if let (Some(func_name), Some(start), Some(end)) = (func_name, low_pc, high_pc) { | ||
| // Filter out synthetic/relocated addresses similar to line processing | ||
| const MAX_REALISTIC_WASM_ADDR: u64 = 0x40000000; // 1GB threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Have you found related issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah.. I've seen some line mappings that are too large, obviously don't map into the WASM module address range, so might a well not include it the line mapping results. What might be better if if we are told what the last address of the code section is so that we can omit all line numbers that are not within the range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know where are those coming from? Synthetic functions?
Should we just drop them when loading?
| // Strategy 1: Prefer user code over library/dependency code | ||
| if let Some(dir) = directory { | ||
| if dir.contains("/rustc/") || dir.contains("/rust/deps/") { | ||
| score -= 300; // Heavily penalize compiler/dependency paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where those numbers are coming from?
How/when they should be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are just heuristics. I'm assuming a code address can map to multiple source lines. When this happens we need to pick which one of the lines wins.. The good news is I think any of those lines would be valid responses, but I think we should try to pick the one that is the user's source file instead of something like a rust stdlib function that gets inlined. I think we will need more examples and user feedback to get this exactly right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining!
I think we will need more examples and user feedback to get this exactly right.
How do we get the user feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug reports? Does not seem great. The best strategy is likely to add more examples and examine each line mapping to see if it is actually the best. But this seems very labor intensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good reason to leave it out of the main repo (just for now).
Anyhow, having a little doc to explain how to test and tune those parameters seems like a feasible first step.
| std::fs::File::open("./src/count_vowels.wasm.json").expect("could not read json file"), | ||
| ) | ||
| .expect("json failed"); | ||
| assert_eq!(actual, expected); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks to be the library to use in Rust:
https://github.com/mitsuhiko/insta
That said count_vowels.wasm.json is too big, there is no chance we can effectively inspect it... we probably need something different than an approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is not there to be inspected line by line. It's there to demonstrate an example of the output that this module produces. Instead of reading rust code to figure out how to implement the java parser for it, I just look at the example data file it produces to understand it's structure so I can write the jackson bindings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can we have a super minimal one?
| name = "dwarf-rust" | ||
| version = "0.1.0" | ||
| description = "WebAssembly DWARF parser and location resolver" | ||
| authors = ["Ingvar Stepanyan <rreverser@google.com>", "Hiram Chirino <hiram@hiramchirino.com>"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
fix the name ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, well, not sure where this project setup is coming from 😅
| return this; | ||
| } | ||
|
|
||
| public Builder withDebugParser(Function<WasmModule, Stratum> debugParser) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stratum is in the internal package, it should not spill to the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can do something about that while keeping the Stratum api internal.
|
squashed and rebased on main |
bcfa35c to
dca257c
Compare
- Add comprehensive line mapping Stratum infrastructure to runtime module - Enhance InterpreterMachine to use parsed Stratums to enhance stack traces. - Enhance compiler to add SMAPs to class files and use to enhance stack traces. - 3rd party modules can be plugged into provide the DebugParser that extracts the Stratum from a WASM module. This enhancement enables developers to get meaningful stack traces with original source line numbers when debugging WebAssembly modules compiled from high-level languages like Rust and Go. Where you would get exceptions that looked like. ``` com.dylibso.chicory.runtime.TrapException: Trapped on unreachable instruction at com.dylibso.chicory.runtime.InterpreterMachine.THROW_UNREACHABLE(InterpreterMachine.java:2212) at com.dylibso.chicory.runtime.InterpreterMachine.eval(InterpreterMachine.java:182) at com.dylibso.chicory.runtime.InterpreterMachine.call(InterpreterMachine.java:100) at com.dylibso.chicory.runtime.InterpreterMachine.CALL(InterpreterMachine.java:1715) ... more of the same … at com.dylibso.chicory.runtime.InterpreterMachine.eval(InterpreterMachine.java:550) at com.dylibso.chicory.runtime.InterpreterMachine.call(InterpreterMachine.java:100) at com.dylibso.chicory.runtime.InterpreterMachine.call(InterpreterMachine.java:65) at com.dylibso.chicory.runtime.Instance$Exports.lambda$function$0(Instance.java:219) at com.dylibso.chicory.testing.MachinesTest.lambda$shouldEmitUnderstandableStackTraces$0(MachinesTest.java:300) at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53) at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35) at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3128) at com.dylibso.chicory.testing.MachinesTest.shouldEmitUnderstandableStackTraces(MachinesTest.java:300) ``` You now get exceptions that look like: ``` com.dylibso.chicory.runtime.TrapException: Trapped on unreachable instruction at chicory interpreter 0x006721: func.94(library/std/src/sys/pal/wasm/../unsupported/common.rs:28) at chicory interpreter 0x005cc6: func.79(library/std/src/panicking.rs:699) at chicory interpreter 0x005c00: func.78(library/std/src/sys/backtrace.rs:168) at chicory interpreter 0x00627d: func.86(library/std/src/panicking.rs:697) at chicory interpreter 0x007e74: func.118(library/core/src/panicking.rs:117) at chicory interpreter 0x007ec8: func.119(library/core/src/panicking.rs:218) at chicory interpreter 0x001cf6: func.30(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/core/src/ub_checks.rs:68) at chicory interpreter 0x003948: func.54(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/core/src/ub_checks.rs:75) at chicory interpreter 0x000d7f: func.11(src/lib.rs:23) at com.dylibso.chicory.runtime.Instance$Exports.lambda$function$0(Instance.java:219) at com.dylibso.chicory.testing.MachinesTest.lambda$shouldEmitUnderstandableStackTraces$0(MachinesTest.java:300) at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53) at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35) at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3128) at com.dylibso.chicory.testing.MachinesTest.shouldEmitUnderstandableStackTraces(MachinesTest.java:300) ``` Signed-off-by: Hiram Chirino <hiram@hiramchirino.com>
… externally. Signed-off-by: Hiram Chirino <hiram@hiramchirino.com>
|
I've removed the dwarf-parser, so that we can just host that externally. |
Better handle enhancing stacks traces occur in both the compiler and The interpreter. Adding the dwarf-parser back for now to aid in testing.
|
Added back the dwarf-parser as it needed to really test things end to end. |
andreaTP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great step forward!
I think this is great progress, but we need to refine a few aspects:
- unless there is a strong counter motivation the API to enable debugging should be the same across
Machines - There is still a lot of code "around", hard-coded in the compiler/interpreter that does a lot of assumptions on Stratum, lines of code etc.
Stratumstill brings behavior + data while, ideally, we should have:- a "data-only" API in Chicory
- injectable behavior completely defined outside
To get things easier I propose that we don't support(for now) "mixed" instances (compiled + interpreter) so that we can focus on getting the APIs right with less code around.
Let me know your thoughts @chirino and thanks again!
| public final class Compiler { | ||
|
|
||
| public static final String DEFAULT_CLASS_NAME = "com.dylibso.chicory.$gen.CompiledMachine"; | ||
| public static final String DEFAULT_CLASS_NAME = "com.dylibso.chicory.$gen%d.CompiledMachine"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final String DEFAULT_CLASS_NAME = "com.dylibso.chicory.$gen%d.CompiledMachine"; | |
| public static final String DEFAULT_CLASS_NAME = "com.dylibso.chicory.$gen.CompiledMachine%d"; |
nit
| debugParser = null; | ||
| } | ||
|
|
||
| getLog().info("Generating AOT classes for " + name + " from " + wasmFile); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated - note to self(or anyone else) - remove AOT in this sentence
compiler/src/main/java/com/dylibso/chicory/compiler/internal/Compiler.java
Show resolved
Hide resolved
| baseName + ".smap", smap.getBytes(java.nio.charset.StandardCharsets.UTF_8)); | ||
|
|
||
| if (smap.length() < 65535) { | ||
| classWriter.visitSource(internalClassName, smap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super interesting!
It should enable IDEs integration, have you tested it? How?
Why the limit on length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it might. But I need to figure out how to setup a project to do it.
Seems break if it's too big. Found it compiling the quickjs-provider.javy-dynamic.wasm module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we validate those changes? The limit seems arbitrary ...
Have you attempted to use a small and self-contained module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the value by looking at what was throwing the exception:
| } | ||
|
|
||
| // if we have debug info | ||
| if (!debugContext.inputStratum.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any performance impact?
Should we make it configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already configurable. If you don't supply a debug parser, the inputStratum is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't supply a debug parser
Build time or runtime?
What I meant for "configurable" is:
a module compiled with debug parser that want to disable it at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why enable it at build time if your going to disable it a runtime? Compilers don't usually work that way. Either you compiled in debug info and take the runtime hit, or you don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sweet spot would be if we can compile the debugging options in and turn them off by default.
Hard to take a decision without real world data though.
| continue; | ||
| } | ||
|
|
||
| // The following happens when we reach the first interpreter frame.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, we are in the interpreter here ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some of the frames are interpreter frames. The interpreter frames will be sandwiched between either compiler frames or user code frames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose we get a first iteration that will not work for mixed mode.
It will make it easier to change things around and move forward.
| continue; | ||
| } | ||
|
|
||
| // we eat all the interpreter frames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of rough edges around overall, since the interaction interpreter/compiler seems to cause more work I'd suggest that we start not supporting them.
E.g. we do support only fully compiled code and fully interpreted code.
We can add back support for those cases in a subsequent iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already put int the hard work to get this to work. It would be a shame to drop it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a drop, just add it back at a later stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sad thing is that dropping that would not simplify this code your commenting on here since we still need to avoid replacing user code frames that could be sandwiched between the interpreter frames.
| /** | ||
| * Sorts the stratum data for optimized lookups using a binary search. | ||
| */ | ||
| Stratum optimizeForLookups(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a pretty challenging API to be implemented by a user, do we expect ppl to provide their own implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect them to provide their own. Thats why we supply the Stratum api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we add a public API, it is supposed to be used by users.
Having an API that only matches 1:1 with one specific implementation is a code smell.
| use super::*; | ||
|
|
||
| #[test] | ||
| fn test_count_vowels() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using this test to extract what pure Rust will be producing?
Please, add at least a comment on how to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.. ok
I might have a better idea on how to do this. Maybe allow configuring the withDebugParser on the interpreter machine too.
Not sure I understand what you mean by this point. Could you point me to some examples?
I tried to make the Stratum interface as narrow as possible. I'm open to suggestions on how to improve.
I could move this to follow up PR. |
Happy to have a look at a proposal!
Yes, for example
Honestly, I have the feeling that continuing to go forth and back on this huge implementation is going to be a long and tedious activity for us two, additionally, the number of comments is starting to hurt in readability. My proposal is to close or convert this PR back to Draft and take out small and consistent pieces one at the time as separate PRs @chirino . |
|
How do you feel about using this config to enable the debug parser for the interpreter? It would look much more like what we are doing for the compiler: |
The problem is the PR requires many different modules to work together in concert and you cant really test or evaluate the suitability of one part without the rest to get the big picture. |
How we did it in the past it to keep a Draft PR open showing the direction, and merging smaller PRs to progressively reduce the diff while we agree on the steps. |
Just tried this.. it won't work to the case where your doing runtime compile with some functions interpreted. As the Compiled code just passes the Instance to create the Interpreter and you don't have debugParser anymore. Setting it on the instance solved that problem. |
|
PR converted to draft. |
Signed-off-by: Hiram Chirino <hiram@hiramchirino.com>
Stratum and LineMapping are now simple immutable data classes. All the SMAP formatting logic has been moved to the Smap class. Builder patters have been added to construct them. Signed-off-by: Hiram Chirino <hiram@hiramchirino.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this file was checked in by mistake, because it's huge and it accounts for like >75% of this PR's line count 😄
| @@ -0,0 +1,157 @@ | |||
| --- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In progress docs, didn't want them getting published yet.
| import com.dylibso.chicory.wasm.WasmModule; | ||
| import java.util.function.Function; | ||
|
|
||
| public interface DebugParser extends Function<WasmModule, Stratum> {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is it useful to extend Function? Do we ever use this as a Function? otherwise we can just define a plain interface -- in fact, Function<T,U> is the pattern the codebase adopts for factories; I would personally prefer properly named factories in the codebase, but we should probably be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes so we can do a ServiceLoader.load(DebugParser.class) in the ChicoryCompilerGenMojo class.
evacchi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments (mostly nits). There is just one huge file that makes it look like this PR is bigger than it actually is
| Instance instance = Instance.builder(module) | ||
| .withMachineFactory( | ||
| MachineFactoryCompiler.builder(module) | ||
| .withDebugParser(DebugParser::parse) | ||
| .compile()) | ||
| .build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be both on the Instance.builder and the machine factory (I've seen this in the test cases)
speaking of which, what if, as a contract, if the Machine factory supports debug symbols, it just implicitly pulls them off the Instance it's being given, allowing us to avoid being explicit here?
Instance instance = Instance.builder(module)
.withDebugParser(DebugParser::parse)
.withMachineFactory(
MachineFactoryCompiler.builder(module)
// no longer needed: .withDebugParser(DebugParser::parse)
.compile())
.build();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda.. It's only needed if there are some interpreted functions. But for completeness yes.
Letting the Machine factory pull the debug symbols from the instance would mean that we have to defer compiling the .class files until the MachineFactory.appy(Instance ) gets called. That would be a big change to how that works. Not sure if we could continue supporting all the runtime compile use cases we currently do. Like compiling once and using that for multiple instances.
| @@ -0,0 +1,29 @@ | |||
| // Copyright 2019 The Chromium Authors | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sources in the dir originate from https://github.com/bmeurer/wasm-source-map
| // if let FuncState::Start = func_state { | ||
| // func_state = if row.address() == 0 { | ||
| // FuncState::Ignored | ||
| // } else { | ||
| // FuncState::Normal | ||
| // }; | ||
| // } | ||
| // if let FuncState::Ignored = func_state { | ||
| // if row.end_sequence() { | ||
| // func_state = FuncState::Start; | ||
| // } | ||
| // continue; | ||
| // } | ||
| // if row.end_sequence() { | ||
| // func_state = FuncState::Start; | ||
| // } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftovers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah.
| }; | ||
|
|
||
| // It seems we need to add 1 to the line numbers for Rust. | ||
| let is_rust = lang == constants::DW_LANG_Rust.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intended to be Rust-specific? are we going to provider parsers for other languages? how does it work for Go, C etc...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory it is not rust specific. We just need more tests for other languages to verify.
| stack = new MStack(); | ||
| this.callStack = new ArrayDeque<>(); | ||
|
|
||
| CustomSection smapSection = instance.module().customSection("SMAP"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- maybe let's add a few notes about the convention we are exploiting here, I see that this custom section SMAP is being inserted by us in Generator.java;
- this also means that the module is a "meta-module", that is it won't contain any code;
- thus, it doesn't even make a lot of sense for it to be interpreted in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment to this section to clarify:
// The SMAP custom section only gets added to WASM modules that are created by
// the build time compiler (see Generator.java).
// The module is meta-module and will only contain code for functions which
// will be interpreted (typically because they were too large to compile to bytecode).
| .withName("SMAP") | ||
| .withBytes(smap.getBytes(StandardCharsets.UTF_8)) | ||
| .build()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let's add a comment describing what's going on here.
also, since this is a text file we could also just save it as a resource and load it as such at run-time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this eventually needs to be loaded by the CompilerInterpreterMachine. So it would need to know at least the class name to figure out the right resource name. I was jus trying to avoid changing that interface.
| * as well as to provide a way to retrieve function mappings based on line numbers. | ||
| * </p> | ||
| */ | ||
| public final class Stratum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like the public interface for Stratum is very thin, after all:
boolean isEmpty()
String getFunctionMapping(int)
Line getInputLine(int)
LineMapping getLineMapping(int)
maybe we can externalize further the remaining behavior and keep just an interface with the methods above for the core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we envision needing more than one implementation of it?
BTW. LineMapping getLineMapping(int) is private.
Signed-off-by: Hiram Chirino <hiram@hiramchirino.com>
This enhancement enables developers to get meaningful stack traces with original source line numbers
when debugging WebAssembly modules compiled from high-level languages like Rust and Go.
Where you would get exceptions that looked like.
You now get exceptions that look like:
and if your using the compiler: