Log in

(no subject)

HTM is interesting, but what about hardware support for locks? As the main performance problem with locking seems to be cache misses during contention, it occurs to me to wonder what would happen if a multicore chip devoted some of its gates to a reasonable number (32K?) of shared 1 bit registers, along with instructions for read, write and TAS, so that locks could be implemented without the overhead of memory latency. The OS would need to allocate a register to each lock at initialisation - or maintain a traditional, memory based lock if it runs out of registers, but then many structures could be locked at coarser granularity, so there should actually be significantly fewer locks.

Has anyone done this? If not, why not? The circuitry would be essentially identical to that which maintains the cache lines.

Comment Form

No HTML allowed in subject


Notice! This user has turned on the option that logs your IP address when posting. 

(will be screened)