Skip to content

Conversation

stooke
Copy link
Contributor

@stooke stooke commented Apr 24, 2020

This PR is for an initial implementation of debug support for Graal on Windows. At this stage I submit it more for discussion before I go much further down this road.

The patch allows Graal to emit CodeView 4 records into windows .obj files, which the windows linker then moves to PDB files at link time. Visual Studio can then open the executable, breakpoints can be set, and single-stepping, etc. will work.

Variables, and type information are not currently supported, and there is limited support for stack frames traceback.

I work closely with Andrew Dinn @adinn on debug support, so my code makes use of his previous work; I duplicate some of the shared classes from his PR here.

There are some differences in how I approach line information, but these are currently compile-time switches. For more details, please see DEBUGINFO_WINDOWS.md.

This is PR #2245, but properly rebased and flattened.

@olpaw olpaw requested review from adinn, olpaw and pejovica April 27, 2020 11:40
@olpaw olpaw self-assigned this Apr 27, 2020
Copy link
Collaborator

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to work these details into one common .md file rather than document Windows differences separately.

I like the idea of providing a mode to skip Graal intrinsics but I'm not keen on a few details. First, this really ought to be done uniformly for both Linux and Windows. Secondly, I think it would be better to select the mode by passing a specific value for the GenerateDebugInfo option rather than by configuring a property -- say, 0 means no debug info, 1 debug info except for Graal intrinsics and 2 means debug info for everything including Graal intrinsics.

However, let me also add that I think skipping the Graal intrinsics may look nice on the surface but I don't think it's actually going to be much use to users. Anyone trying to debug a heavily optimized native app during development needs to have their head examined. That should be done on OpenJDK where you can much better follow what the code is doing. Debugging using this capability is really for if/when something goes wrong with a deployed native image that did not turn up when an app was tested (on OpenJDK first and then as a native app). In that failure scenario eliding the Graal code is going to mean a user will not be able to form a clear idea of what the program is doing and why it is going wrong. Errors ni that scenario will come down to how the compiler has translated 1) app code to machine code or 2) JDK runtime code to Graal SubstrateVM code to machine code. In which case it might be better to use GenerateDebugInfo=1 for all info and omit Graal internals for GenerateDebugInfo=2. If we stick with the property flag then the default option should be to generate all info.

Sorry this comment is just with reference to the .md file. I'll continue reviewing the rest of the patch tomorrow.

Copy link
Collaborator

@adinn adinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is currently missing is log output sowing what is written to each section. I think this is very likely to be needed to help debug problems that inevitably will turn up. I only resolved some of the more tricky bugs I had injected by noting disparities between the log trace and objdump output (or occasionally, in extremis, the bytes found at relevant offsets into the binary). So, I suggest you follow the DWARF generator and log details of what your code thinks is being written and at what location in the section.

If you look at the Dwarf sections implementations they all write their content by implementing abstract method of their parent DwarfSectionImpl:

public abstract void writeContent(DebugContext debugContext);

The DebugContext provides methods to appendi log output to the Graal log file in a subscope of the one used to log image write info. That allows log output to be enabled/disabled using a pattern matching filter on the scope hierarchy (-HLog=pattern1,pattern2). In the DWARF code each debug context is created per section in this method of DwarfObjectImpl

public byte[] getOrDecideContent(Map<ObjectFile.Element, LayoutDecisionMap> alreadyDecided, byte[] contentHint) {
. . .
getOwner().debugContext(debugSectionLogName(), this::writeContent);
. . .

n.b. method debugContext is a method of ObjectFile. Under the image write operation the object file owning the sections holds a handle on the DebugContext with scope Image.write. Method debugContext creates a new DebugContext with a subscope defined by appending the section name supplied as the first argument to the parent scope. It passes that DebugContext to the Consumer supplied as arg 2 (i.e. this::writeContent).

The individual sections can use the DebugContext instance to generate trace within their own hierarchically named scope e.g.

protected void log(DebugContext context, String format, Object... args) {
    if (context.areScopesEnabled()) {
        context.logv(DebugContext.INFO_LEVEL, format, args);
    }
}

@stooke
Copy link
Contributor Author

stooke commented Apr 30, 2020

One thing that is currently missing is log output sowing what is written to each section. I think this is very likely to be needed to help debug problems that inevitably will turn up. I only resolved some of the more tricky bugs I had injected by noting disparities between the log trace and objdump output (or occasionally, in extremis, the bytes found at relevant offsets into the binary). So, I suggest you follow the DWARF generator and log details of what your code thinks is being written and at what location in the section.

I currently do use the DebugContext object (see CV*SectionImpl), but I need to expand the logging to make it clear what's been written (and why). When I moved over to this logging implementation, there were some places DebugContext wasn't conveniently available; I'll revisit that now.

@adinn
Copy link
Collaborator

adinn commented May 1, 2020

Yes, I wired the debugContext through into the methods that do the writing. Adding detailed log dump of the write operations at the start is pretty much guaranteed to pay off. I'm impressed, indeed somewhat amazed, that you got this to work without it.

@stooke
Copy link
Contributor Author

stooke commented May 1, 2020

Yes, I wired the debugContext through into the methods that do the writing. Adding detailed log dump of the write operations at the start is pretty much guaranteed to pay off. I'm impressed, indeed somewhat amazed, that you got this to work without it.

(blushes) Actually, I did pretty much what you did but with different spelling. All my record types and subtypes have toString(), and I had a simplistic logger that showed me what was coming out, and why. Then, I also looked at the binary with a COFF-aware hex viewer, and also compared my output to cl.exe and clang, my own 'coffutil' dumper, 'pdbdump' from Microsoft (etc) to see what was going on.

But, I still can't get clean backtraces some of the time!

@stooke
Copy link
Contributor Author

stooke commented May 1, 2020

Yes, I wired the debugContext through into the methods that do the writing. Adding detailed log dump of the write operations at the start is pretty much guaranteed to pay off. I'm impressed, indeed somewhat amazed, that you got this to work without it.

I've reworked this a bit for a more informative output.

@stooke
Copy link
Contributor Author

stooke commented May 20, 2020

@olpaw , @adinn, I believe I've addressed the issues in this review round. Please let me know if I've missed something.

@stooke stooke closed this May 20, 2020
@stooke stooke reopened this May 20, 2020
@olpaw
Copy link
Member

olpaw commented May 29, 2020

@adinn since @stooke s work is largely based on your work I would be happy if you could give this PR a thorough review before I dive in. Would that be ok with you?

@adinn
Copy link
Collaborator

adinn commented May 29, 2020

Yes, it is very near the top of my todo list :-)

@pejovica
Copy link
Member

@stooke, I finally managed to try out the PR locally, so I'd like to share some initial impressions.

I made a minimal test image on JDK 8 and used WinDbg that comes with Windows SDK 7.1 to test the PR. I have to say that it was really refreshing to see some meaningful symbols for a change. Well done!

The current symbols are already very helpful when going through disassembly, but I wondered if it would be possible to improve the method signatures. That is, would it be possible to replace hashes with something more useful?

I could also see the sources, so that works. I guess we can't avoid some jumping around due to optimizations, but I noticed that in case of substitutions, the original sources are displayed, which was a bit confusing.

Also, I noticed that the time to set the first breakpoint varies greatly depending on whether the inlining is enabled (up to 30+ seconds with our enterprise inliner). Have you seen anything similar?

@stooke stooke closed this Sep 23, 2020
@stooke stooke reopened this Sep 23, 2020
@stooke
Copy link
Contributor Author

stooke commented Oct 2, 2020

@pejovica, I think I have either made changes or responded to your latest comments. Could you please take another look?

@adinn
Copy link
Collaborator

adinn commented Oct 8, 2020

@pejovica @olpaw are there still any outstanding requests for changes to this PR that have not been met? in particular any that cannot be left as follow-up improvements? We would very much like to see /some/ support for debugging on Windows in CE even if it is limited and I believe this PR provides a baseline implementation that works. If possible we would like to integrate it as is modulo any minor changes you still think are critical. Is that possible or do you have reservations about how it works that still need addressing?

Copy link
Member

@pejovica pejovica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stooke @adinn Sorry about the delay. I finished another round and the code as a whole looks good. I'll start the integration once these last few comments are addressed.

* are created while building the symbol section.
*/
for (FileEntry entry : cvDebugInfo.getFiles()) {
addFile(entry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fine with being defensive, but I don't think we should be adding files just in case. Therefore, I suggest removing this, or you could try to make an assertion that there are no unused files (I think that should be true at least for now), in which case that assertion should be put in CVSymbolSectionImpl.addFileRecord instead of here.

@stooke
Copy link
Contributor Author

stooke commented Oct 13, 2020

@pejovica, thank you very much for your comments. I've redone the code for most of them, but given an explanation for why I would rather avoid refactoring CVLineRecord at this time.

Copy link
Member

@pejovica pejovica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stooke Can you update to the latest master, there are a few minor conflicts. Also, could you clean up all the unused imports so that our style gates wouldn't complain.

@pejovica
Copy link
Member

@stooke Given that PR is basically ready, would it be possible to take care of these few remaining technicalities today / tomorrow? The feature freeze for 20.3 is tomorrow, and it would be nice if we could get this in as well (but no rush, of course).

@stooke
Copy link
Contributor Author

stooke commented Oct 22, 2020

@pejovica, thanks once again. I have merged the head, and made changes as per your review. I will watch the Travis builds to see how they do.

@pejovica
Copy link
Member

I have merged the head, and made changes as per your review. I will watch the Travis builds to see how they do.

Awesome! And now let's see what our CI has to say. 😉

@pejovica
Copy link
Member

@stooke So far so good! We only have three style issues:

----------
1. ERROR in /b/b/e/graal/substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVLineRecord.java (at line 29)
	import com.oracle.objectfile.debugentry.FileEntry;
	       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The import com.oracle.objectfile.debugentry.FileEntry is never used
----------
----------
2. ERROR in /b/b/e/graal/substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java (at line 43)
	abstract class CVTypeRecord {
	               ^^^^^^^^^^^^
The type CVTypeRecord should also implement hashCode() since it overrides Object.equals()
----------
----------
3. ERROR in /b/b/e/graal/substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeSectionImpl.java (at line 36)
	import java.util.ArrayList;
	       ^^^^^^^^^^^^^^^^^^^
The import java.util.ArrayList is never used
----------
3 problems (3 errors)

@stooke
Copy link
Contributor Author

stooke commented Oct 22, 2020

@pejovica thanks for the heads up - Travis is still pending. These crept in at the last moment. I have added an abstract hashCode(), to enforce that new type classes will have to implement that method.

Copy link
Member

@pejovica pejovica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stooke I've suggested two more minor changes, but we can merge with or without them, you decide.

@pejovica
Copy link
Member

We have a few more style issues, but everything else passed!

  1. Formatting
3 files were modified
 - src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java
Changes:
--- a/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java
+++ b/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java
@@ -91,7 +91,7 @@
         if (obj == null || obj.getClass() != this.getClass()) {
             return false;
         }
-        return this.type == ((CVTypeRecord)obj).type;
+        return this.type == ((CVTypeRecord) obj).type;
     }
 
     @Override
@@ -175,7 +175,7 @@
             if (!super.equals(obj)) {
                 return false;
             }
-            CVTypeProcedureRecord other = (CVTypeProcedureRecord)obj;
+            CVTypeProcedureRecord other = (CVTypeProcedureRecord) obj;
             return this.returnType == other.returnType && this.argList == other.argList;
         }
     }
@@ -231,7 +231,7 @@
             if (!super.equals(obj)) {
                 return false;
             }
-            CVTypeArglistRecord other = (CVTypeArglistRecord)obj;
+            CVTypeArglistRecord other = (CVTypeArglistRecord) obj;
             return this.args.equals(other.args);
         }
     }

 - src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubrecord.java
Changes:
--- a/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubrecord.java
+++ b/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubrecord.java
@@ -247,8 +247,8 @@
     }
 
     /*
-     * Creating a proc32 record has a side effect: two relocation entries are added to the section relocation
-     * table, they refer back to the global symbol.
+     * Creating a proc32 record has a side effect: two relocation entries are added to the section
+     * relocation table, they refer back to the global symbol.
      */
     public static class CVSymbolGProc32Record extends CVSymbolSubrecord {
 

 - src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubsectionBuilder.java
Changes:
--- a/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubsectionBuilder.java
+++ b/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubsectionBuilder.java
@@ -95,7 +95,8 @@
         /* S_PROC32 add function definition. */
         int functionTypeIndex = addTypeRecords(primaryEntry);
         byte funcFlags = 0;
-        CVSymbolSubrecord.CVSymbolGProc32Record proc32 = new CVSymbolSubrecord.CVSymbolGProc32Record(cvDebugInfo, externalName, debuggerName, 0, 0, 0, primaryRange.getHi() - primaryRange.getLo(), 0, 0,
+        CVSymbolSubrecord.CVSymbolGProc32Record proc32 = new CVSymbolSubrecord.CVSymbolGProc32Record(cvDebugInfo, externalName, debuggerName, 0, 0, 0, primaryRange.getHi() - primaryRange.getLo(), 0,
+                        0,
                         functionTypeIndex, primaryRange.getLo(), (short) 0, funcFlags);
         addToSymbolSubsection(proc32);
 
@@ -146,7 +147,10 @@
 
     private void addLineNumberRecords(PrimaryEntry primaryEntry) {
         CVLineRecord record = lineRecordBuilder.build(primaryEntry);
-        /* If there are no file entries (perhaps for a synthetic function?), we don't add this record. */
+        /*
+         * If there are no file entries (perhaps for a synthetic function?), we don't add this
+         * record.
+         */
         if (!record.isEmpty()) {
             cvDebugInfo.getCVSymbolSection().addRecord(record);
         }
  1. Checkstyle
Checkstyle ends with 4 errors.
/b/b/e/graal/substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java:94: 'typecast' is not followed by whitespace.
/b/b/e/graal/substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java:98: 'public' modifier out of order with the JLS suggestions.
/b/b/e/graal/substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java:178: 'typecast' is not followed by whitespace.
/b/b/e/graal/substratevm/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVTypeRecord.java:234: 'typecast' is not followed by whitespace.

@stooke
Copy link
Contributor Author

stooke commented Oct 23, 2020

@pejovica, thank you again. I have read and incorporated your suggestions and fixed the style issues.

@pejovica
Copy link
Member

Ah, we have another formatting error (should be the last one).

1 files were modified
 - src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubsectionBuilder.java
Changes:
--- a/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubsectionBuilder.java
+++ b/src/com.oracle.objectfile/src/com/oracle/objectfile/pecoff/cv/CVSymbolSubsectionBuilder.java
@@ -96,7 +96,7 @@
         int functionTypeIndex = addTypeRecords(primaryEntry);
         byte funcFlags = 0;
         CVSymbolSubrecord.CVSymbolGProc32Record proc32 = new CVSymbolSubrecord.CVSymbolGProc32Record(cvDebugInfo, externalName, debuggerName, 0, 0, 0, primaryRange.getHi() - primaryRange.getLo(), 0,
-                0, functionTypeIndex, primaryRange.getLo(), (short) 0, funcFlags);
+                        0, functionTypeIndex, primaryRange.getLo(), (short) 0, funcFlags);
         addToSymbolSubsection(proc32);
 
         /* S_FRAMEPROC add frame definitions. */

@stooke
Copy link
Contributor Author

stooke commented Oct 23, 2020

@pejovica, hopefully I got it right this time. My editor has a tendency to autoindent.

@pejovica
Copy link
Member

@stooke Lastly, could you please clean up your commit log by squashing commits where possible (you can use git rebase --interactive origin/master).

@olpaw
Copy link
Member

olpaw commented Oct 23, 2020

@stooke Lastly, could you please clean up your commit log by squashing commits where possible (you can use git rebase --interactive origin/master).

The existing commit messages on this PR do not add much value. @stooke please squash the whole PR into a single commit (starting the message with an upper-case letter).

Copy link
Member

@olpaw olpaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pejovica
Copy link
Member

hopefully I got it right this time. My editor has a tendency to autoindent.

@stooke As for our CI, all is good!

@stooke stooke force-pushed the pr_debug_pecoff_prototype branch from b98d414 to f50a7b9 Compare October 23, 2020 15:57
@pejovica
Copy link
Member

@stooke PR is in the merge queue! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants