Skip to content

Conversation

@bugadani
Copy link
Contributor

@bugadani bugadani commented Sep 18, 2023

The impl is optimized for the first acquire.

Multi-core lock is now initialized to a value so that we only have to extract a single bit from the PRID register (Xtensa) to create a thread id value. Previous implementation branched on the bit, or incremented the value read after masking, neither of which is necessary.

We no longer count reentries (as lock_api did) because it doesn't matter. We don't have to count locks because the contract of critical_section requires that every acquire be followed by a respective release, nested acquires and releases happening properly paired. This means we have to differentiate the first entry from reentries, but not any two subsequent reentries from each other.

Assembly before (67 + 14 instructions)

42098d74 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E>:
42098d74:	006136        	entry	a1, 48
42098d77:	006520        	rsil	a2, 5
42098d7a:	a0c681        	l32r	a8, 42081094 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x8de8> (2000 <VECTORS_SIZE+0x1c00>)
42098d7d:	03eb90        	rsr.prid	a9
42098d80:	108980        	and	a8, a9, a8
42098d83:	090c      	movi.n	a9, 0
42098d85:	631897        	beq	a8, a9, 42098dec <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x78>
42098d88:	2a0c      	movi.n	a10, 2
42098d8a:	a14881        	l32r	a8, 420812ac <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9000> (3fccabf0 <_ZN14esp_hal_common21critical_section_impl9multicore14MULTICORE_LOCK17h4d53d46410f8cc00E>)
42098d8d:	08b8      	l32i.n	a11, a8, 0
42098d8f:	631ba7        	beq	a11, a10, 42098df6 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x82>
42098d92:	01a9      	s32i.n	a10, a1, 0
42098d94:	c88b      	addi.n	a12, a8, 8
42098d96:	1b0c      	movi.n	a11, 1
42098d98:	3d0c      	movi.n	a13, 3
42098d9a:	10ddc0        	and	a13, a13, a12
42098d9d:	c0ccd0        	sub	a12, a12, a13
42098da0:	11ddd0        	slli	a13, a13, 3
42098da3:	ffa0e2        	movi	a14, 255
42098da6:	ff7c      	movi.n	a15, -1
42098da8:	000146        	j	42098db1 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x3d>
42098dab:	0020c0        	memw
42098dae:	561726        	beqi	a7, 1, 42098e08 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x94>
42098db1:	401d00        	ssl	a13
42098db4:	a17e00        	sll	a7, a14
42098db7:	3077f0        	xor	a7, a7, a15
42098dba:	0c68      	l32i.n	a6, a12, 0
42098dbc:	103670        	and	a3, a6, a7
42098dbf:	a16900        	sll	a6, a9
42098dc2:	a15b00        	sll	a5, a11
42098dc5:	03ad      	mov.n	a10, a3
42098dc7:	2036a0        	or	a3, a6, a10
42098dca:	2045a0        	or	a4, a5, a10
42098dcd:	130c30        	wsr.scompare1	a3
42098dd0:	00ec42        	s32c1i	a4, a12, 0
42098dd3:	103470        	and	a3, a4, a7
42098dd6:	eb93a7        	bne	a3, a10, 42098dc5 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x51>
42098dd9:	400d00        	ssr	a13
42098ddc:	91a040        	srl	a10, a4
42098ddf:	10aae0        	and	a10, a10, a14
42098de2:	0b7d      	mov.n	a7, a11
42098de4:	c31a97        	beq	a10, a9, 42098dab <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x37>
42098de7:	097d      	mov.n	a7, a9
42098de9:	ffef86        	j	42098dab <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x37>
42098dec:	1a0c      	movi.n	a10, 1
42098dee:	a12f81        	l32r	a8, 420812ac <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9000> (3fccabf0 <_ZN14esp_hal_common21critical_section_impl9multicore14MULTICORE_LOCK17h4d53d46410f8cc00E>)
42098df1:	08b8      	l32i.n	a11, a8, 0
42098df3:	9b9ba7        	bne	a11, a10, 42098d92 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x1e>
42098df6:	1898      	l32i.n	a9, a8, 4
42098df8:	991b      	addi.n	a9, a9, 1
42098dfa:	0a0c      	movi.n	a10, 0
42098dfc:	0199a7        	bne	a9, a10, 42098e01 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x8d>
42098dff:	1a0c      	movi.n	a10, 1
42098e01:	0d1a26        	beqi	a10, 1, 42098e12 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h6c5abf2dafe9c0f8E+0x9e>
42098e04:	1899      	s32i.n	a9, a8, 4
42098e06:	f01d      	retw.n
42098e08:	0198      	l32i.n	a9, a1, 0
42098e0a:	0899      	s32i.n	a9, a8, 0
42098e0c:	190c      	movi.n	a9, 1
42098e0e:	1899      	s32i.n	a9, a8, 4
42098e10:	f01d      	retw.n
42098e12:	a128a1        	l32r	a10, 420812b4 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9008> (3c110fab <str.0+0x1b>)
42098e15:	2b2c      	movi.n	a11, 34
42098e17:	a128c1        	l32r	a12, 420812b8 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x900c> (3c111030 <str.0+0xa0>)
42098e1a:	9e2581        	l32r	a8, 420806b0 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x8404> (420abf18 <_ZN4core6option13expect_failed17h75fa18ccb47945faE>)
42098e1d:	0008e0        	callx8	a8
42098e20:	0041f0        	break	1, 15

release:
42099e2c:	004136        	entry	a1, 32
42099e2f:	01b216        	beqz	a2, 42099e4e <_critical_section_1_0_release+0x22>
42099e32:	9d1e81        	l32r	a8, 420812ac <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9000> (3fccabf0 <_ZN14esp_hal_common21critical_section_impl9multicore14MULTICORE_LOCK17h4d53d46410f8cc00E>)
42099e35:	1898      	l32i.n	a9, a8, 4
42099e37:	990b      	addi.n	a9, a9, -1
42099e39:	1899      	s32i.n	a9, a8, 4
42099e3b:	009956        	bnez	a9, 42099e48 <_critical_section_1_0_release+0x1c>
42099e3e:	090c      	movi.n	a9, 0
42099e40:	0899      	s32i.n	a9, a8, 0
42099e42:	0020c0        	memw
42099e45:	084892        	s8i	a9, a8, 8
42099e48:	13e620        	wsr.ps	a2
42099e4b:	002010        	rsync
42099e4e:	f01d      	retw.n

Assembly after (25 + 9 instructions)

4209a418 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h3018f23500f1cd95E>:
4209a418:	004136        	entry	a1, 32
4209a41b:	006580        	rsil	a8, 5
4209a41e:	9f2e91        	l32r	a9, 420820d8 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x8e3c> (2000 <VECTORS_SIZE+0x1c00>)
4209a421:	03eba0        	rsr.prid	a10
4209a424:	109a90        	and	a9, a10, a9
4209a427:	1a0c      	movi.n	a10, 1
4209a429:	9fb1b1        	l32r	a11, 420822f0 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9054> (3fc8dd14 <_ZN14esp_hal_common21critical_section_impl9multicore14MULTICORE_LOCK17h14a72e78d20d2ea3E>)
4209a42c:	130ca0        	wsr.scompare1	a10
4209a42f:	09cd      	mov.n	a12, a9
4209a431:	00ebc2        	s32c1i	a12, a11, 0
4209a434:	0020c0        	memw
4209a437:	1a1c26        	beqi	a12, 1, 4209a455 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h3018f23500f1cd95E+0x3d>
4209a43a:	0bc8      	l32i.n	a12, a11, 0
4209a43c:	079c97        	bne	a12, a9, 4209a447 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h3018f23500f1cd95E+0x2f>
4209a43f:	9d6191        	l32r	a9, 420819c4 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x8728> (80000000 <_rtc_dummy_end+0x1ff02000>)
4209a442:	202890        	or	a2, a8, a9
4209a445:	f01d      	retw.n
4209a447:	130ca0        	wsr.scompare1	a10
4209a44a:	09cd      	mov.n	a12, a9
4209a44c:	00ebc2        	s32c1i	a12, a11, 0
4209a44f:	0020c0        	memw
4209a452:	f11c66        	bnei	a12, 1, 4209a447 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17h3018f23500f1cd95E+0x2f>
4209a455:	9dde91        	l32r	a9, 42081bd0 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x8934> (7fffffff <_rtc_dummy_end+0x1ff01fff>)
4209a458:	102890        	and	a2, a8, a9
4209a45b:	f01d      	retw.n

4209b30c <_critical_section_1_0_release>:
4209b30c:	004136        	entry	a1, 32
4209b30f:	00f296        	bltz	a2, 4209b322 <_critical_section_1_0_release+0x16>
4209b312:	0020c0        	memw
4209b315:	9bf681        	l32r	a8, 420822f0 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9054> (3fc8dd14 <_ZN14esp_hal_common21critical_section_impl9multicore14MULTICORE_LOCK17h14a72e78d20d2ea3E>)
4209b318:	190c      	movi.n	a9, 1
4209b31a:	0899      	s32i.n	a9, a8, 0
4209b31c:	13e620        	wsr.ps	a2
4209b31f:	002010        	rsync
4209b322:	f01d      	retw.n

Assembly if I'm allowed to assume that reserved bits read as 0 (23 + 9 instructions)

4209a414 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17ha1e6efeb756876d9E>:
4209a414:	004136        	entry	a1, 32
4209a417:	006520        	rsil	a2, 5
4209a41a:	9f3081        	l32r	a8, 420820dc <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x8e3c> (2000 <VECTORS_SIZE+0x1c00>)
4209a41d:	03eb90        	rsr.prid	a9
4209a420:	108980        	and	a8, a9, a8
4209a423:	190c      	movi.n	a9, 1
4209a425:	9fb3a1        	l32r	a10, 420822f4 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9054> (3fc8dd14 <_ZN14esp_hal_common21critical_section_impl9multicore14MULTICORE_LOCK17h6910932810295dc5E>)
4209a428:	130c90        	wsr.scompare1	a9
4209a42b:	08bd      	mov.n	a11, a8
4209a42d:	00eab2        	s32c1i	a11, a10, 0
4209a430:	0020c0        	memw
4209a433:	1b1b26        	beqi	a11, 1, 4209a452 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17ha1e6efeb756876d9E+0x3e>
4209a436:	0ab8      	l32i.n	a11, a10, 0
4209a438:	089b87        	bne	a11, a8, 4209a444 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17ha1e6efeb756876d9E+0x30>
4209a43b:	9d6381        	l32r	a8, 420819c8 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x8728> (80000000 <_rtc_dummy_end+0x1ff02000>)
4209a43e:	202280        	or	a2, a2, a8
4209a441:	000346        	j	4209a452 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17ha1e6efeb756876d9E+0x3e>
4209a444:	130c90        	wsr.scompare1	a9
4209a447:	08bd      	mov.n	a11, a8
4209a449:	00eab2        	s32c1i	a11, a10, 0
4209a44c:	0020c0        	memw
4209a44f:	f11b66        	bnei	a11, 1, 4209a444 <_ZN14esp_hal_common21critical_section_impl6xtensa107_$LT$impl$u20$critical_section..Impl$u20$for$u20$esp_hal_common..critical_section_impl..CriticalSection$GT$7acquire17ha1e6efeb756876d9E+0x30>
4209a452:	f01d      	retw.n

4209b2d8 <_critical_section_1_0_release>:
4209b2d8:	004136        	entry	a1, 32
4209b2db:	00f296        	bltz	a2, 4209b2ee <_critical_section_1_0_release+0x16>
4209b2de:	0020c0        	memw
4209b2e1:	9c0481        	l32r	a8, 420822f4 <_ZN4p25610arithmetic10projective15ProjectivePoint3add17h832b07f5873c491aE+0x9054> (3fc8dd14 <_ZN14esp_hal_common21critical_section_impl9multicore14MULTICORE_LOCK17h6910932810295dc5E>)
4209b2e4:	190c      	movi.n	a9, 1
4209b2e6:	0899      	s32i.n	a9, a8, 0
4209b2e8:	13e620        	wsr.ps	a2
4209b2eb:	002010        	rsync
4209b2ee:	f01d      	retw.n

@bugadani bugadani force-pushed the lock branch 5 times, most recently from 12d9855 to 049166c Compare September 18, 2023 19:20
@bugadani bugadani force-pushed the lock branch 8 times, most recently from e6c7bcc to 0ac8876 Compare September 19, 2023 18:46
@bugadani
Copy link
Contributor Author

bugadani commented Sep 20, 2023

Looks like there are either a few relocations too many in the direct boot LDs, or a few too few. Hard to do without actually knowing anything about linker scripts, but the massaging done in aab8e77 doesn't change the output of nm at least for the ram example, so I assume it's not incorrect to do, and removes a bunch of weird address math, but I've only done it for the S3 here.

@bugadani bugadani marked this pull request as ready for review September 20, 2023 15:13
@bugadani bugadani requested a review from MabezDev September 20, 2023 16:22
Copy link
Member

@MabezDev MabezDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@MabezDev MabezDev merged commit 996ec49 into esp-rs:main Sep 21, 2023
@bugadani bugadani deleted the lock branch September 21, 2023 15:00
SergioGasquez pushed a commit to SergioGasquez/esp-hal that referenced this pull request Sep 22, 2023
* Optimize multicore critical section impl

* Assert reserved bits, explain bit choices, remove redundant checks

* Don't assume the bit reads as 0

* Simplify code generated for thread_id()

* Use non-0 value for unlocked

* Optimise release

* Assume reserved bits read as 0

* Add changelog entry

* Clean up warning

* Fix direct boot ld
@bugadani bugadani mentioned this pull request Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants