Skip to content

Conversation

@quantimnot
Copy link
Contributor

@quantimnot quantimnot commented May 23, 2022

  • Consolidate, de-duplicate and extend package handling
  • Alter how duplicate module names of a package are handled
  • Alter how module names are mangled
  • Fix crash when another package is named 'stdlib' (test case added)
  • Doc what defines a package in the manual

Modules with duplicate names within a package used to be given 'fake'
packages to resolve conflicts. That prevented the ability to discern if
a module belonged to the current project package or a foreign package.
They now have the proper package owner and the names are mangled in a
consistent manner to prevent codegen clashes.

All module names are now mangled the same. Stdlib was treated special
before, but now it is same as any other package. This fixes a crash
when a foreign package is named 'stdlib'.

Module mangling is altered for both file paths and symbols used by the
backends.

Removed an unused module name to package mapping that may have been
intended for IC. The mapping was removed because it wasn't being used
and was complicating the issue of package modules with duplicate names
not having the proper package owner assigned.

PR Reviewers

  • This PR is a requirement for scoping diagnostics to the current project's package vs foreign packages.
  • I made these changes so that I can make styleCheck only consider the project's package instead of foreign packages. See Change styleCheck to ignore foreign packages #19822.
  • The choice of consolidating some of the functionality into a new module was purely an attempt to make things a little more tidy. I think the functionality can be put into ast module without any cyclical imports.
  • The package related procs in modulegraphs couldn't go into packages because of cyclical imports.
  • The package path handling in packagehandling couldn't go into packages because of cyclical imports.
  • Please thoroughly review the module name mangling changes.
    • Will it break HCR somehow?
    • How about IC?
    • How about JS?
    • Should the mangling method be different?
      • Module path mangling is done by packagehandling.mangleModuleName
      • C module mangling is done by cgen.getSomeNameForModule
    • How should this be tested?
  • Writing docs is not a strength of mine. Is the manual section on packages clear and comprehensive?

* Consolidate, de-duplicate and extend package handling
* Alter how duplicate module names of a package are handled
* Alter how module names are mangled
* Fix crash when another package is named 'stdlib' (test case added)
* Doc what defines a package in the manual

Modules with duplicate names within a package used to be given 'fake'
packages to resolve conflicts. That prevented the ability to discern if
a module belonged to the current project package or a foreign package.
They now have the proper package owner and the names are mangled in a
consistent manner to prevent codegen clashes.

All module names are now mangled the same. Stdlib was treated special
before, but now it is same as any other package. This fixes a crash
when a foreign package is named 'stdlib'.

Module mangling is altered for both file paths and symbols used by the
backends.

Removed an unused module name to package mapping that may have been
intended for IC. The mapping was removed because it wasn't being used
and was complicating the issue of package modules with duplicate names
not having the proper package owner assigned.
@quantimnot quantimnot force-pushed the pr_package_awareness branch from 370f0ed to 027e599 Compare May 23, 2022 19:42
@quantimnot quantimnot marked this pull request as draft May 23, 2022 21:01
@quantimnot quantimnot marked this pull request as ready for review May 23, 2022 23:50
@Araq
Copy link
Member

Araq commented May 24, 2022

This is awesome! For name mangling I suggest to think about this algorithm:

proc mangleModuleName(s: string): string =
  result = s & "_0"
  var i = 1
  while fileExists("$nimcache" / result):
    result = s & "_" & $i
    inc i

* Remove `packagehandling.withPackageName` and its uses
* Move module path mangling from `packagehandling` to `modulepaths`
* Move `options.toRodFile` to `ic` to break import cycle
@quantimnot
Copy link
Contributor Author

Using that algorithm doesn't work as a direct replacement of packagehandling.mangleModuleName. I tried it but this is what I found:

  • the nimcache isn't necessarily populated at the time it is called; they ended up all being of the form name_0.nim.c and colliding
  • there could be stale files already in the nimcache that shouldn't be considered
  • there is no way to demangle the names for extccomp.displayProgressCC
  • if I want to examine the generated code, it is difficult to determine which is the module I want to examine if I have a lot with the same module name (I was examining nitter's package, because it has many modules with the same name)

I tried some alternatives, but I can't come up with anything better at the moment. Could that be considered for a future PR?

I just noticed that packagehandling.withPackageName had some dead code, and that it isn't needed anymore with the name mangling. I refactored its uses to just use mangleModuleName and also moved the module mangling procs to the modulepaths module because that seemed more proper.

However, I also just reviewed the contributing guide and read the section on backwards compatibility. Do I need to do the whole deprecation deal with the what I've moved around? I'm not sure of what constitutes the public API of the compiler.

@arnetheduck
Copy link
Contributor

For name mangling I suggest to think about this algorithm

making name mangling depend on what's on the drive already seems like a good way to run into a number of issues such as exploding nimcaches etc as well as bugs when two developers are working on the same project and get different outcomes - what prevents mangling from being deterministic?

@Araq
Copy link
Member

Araq commented May 30, 2022

what prevents mangling from being deterministic?

The fact that eventually you create names that contain the mangled form of /users/arumpf. This is then machine specific and even less "deterministic".

@Araq Araq merged commit d30c641 into nim-lang:devel May 30, 2022
@quantimnot quantimnot deleted the pr_package_awareness branch May 31, 2022 01:52
narimiran pushed a commit that referenced this pull request Jul 18, 2022
* Refactor and doc package handling, module name mangling

* Consolidate, de-duplicate and extend package handling
* Alter how duplicate module names of a package are handled
* Alter how module names are mangled
* Fix crash when another package is named 'stdlib' (test case added)
* Doc what defines a package in the manual

Modules with duplicate names within a package used to be given 'fake'
packages to resolve conflicts. That prevented the ability to discern if
a module belonged to the current project package or a foreign package.
They now have the proper package owner and the names are mangled in a
consistent manner to prevent codegen clashes.

All module names are now mangled the same. Stdlib was treated special
before, but now it is same as any other package. This fixes a crash
when a foreign package is named 'stdlib'.

Module mangling is altered for both file paths and symbols used by the
backends.

Removed an unused module name to package mapping that may have been
intended for IC. The mapping was removed because it wasn't being used
and was complicating the issue of package modules with duplicate names
not having the proper package owner assigned.

* Fix some tests

* Refactor `packagehandling`

* Remove `packagehandling.withPackageName` and its uses
* Move module path mangling from `packagehandling` to `modulepaths`
* Move `options.toRodFile` to `ic` to break import cycle

* Changed import style to match preferred style

Co-authored-by: quantimnot <[email protected]>
(cherry picked from commit d30c641)
@ringabout
Copy link
Member

ringabout commented Jan 18, 2023

Modules with duplicate names within a package used to be given 'fake'
packages to resolve conflicts. That prevented the ability to discern if
a module belonged to the current project package or a foreign package.
They now have the proper package owner and the names are mangled in a
consistent manner to prevent codegen clashes.

@quantimnot Hello, I'm looking into #20139 how does it prevent cases mentioned in that issue?

I'm trying to use toFileName in the hashTypeSym anyway + adds ConfigRef parameters for corresponding functions.

@quantimnot
Copy link
Contributor Author

@ringabout Hi, sorry for the late response. Life has been hectic for a while. Anyway, I've now looked into the issue. The mangling change in this PR is only for handling the paths in the nimcache directory. You could resolve the issue by including the relative paths as you have done in #21274, but as you can see, it is noisy propagating ConfigRef. I commented on your PR.

Araq pushed a commit that referenced this pull request Jul 24, 2024
follow up #19821

dights cannot clash with letters 'Z' and 'O'


For `threading-0.2.0-288108d1dfa34d5ade5ce4d922af51909c83cebf`

Before: 


raiseNilAccess__OOZOOZOnimbleZpkgs50Zthreading4548O50O4845505656494856d49dfa5152d53ade53ce52d575050af5349574857c5651cebfZthreadingZsmartptrs_u4


After:


raiseNilAccess__OOZOOZOnimbleZpkgs2Zthreading450O2O045288108d1dfa34d5ade5ce4d922af51909c83cebfZthreadingZsmartptrs_u4


<del> nimble or something might use `git rev-parse --short HEAD` to
shorten the length of package version names ref
nim-lang/nimble#913 </del>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants