2.1 Data Management for a Large shared NUCA Cache
- Static : long acces time
- Dynamic : time a little less, increased complexity, more power
- local region(of CPU) : private data
- center region : shared data
- Block placement: random (based on tag)
- Problem : locating a block
- sol: multicast partially - > if miss :multicast remained bank
- fast but with heavy network and bank load
- Another sol: partial tag by kim
- high overhead of partial tag
- block is hit only in right bank
- Completely Shared : SD = 16, coherence only in L1
- No Shared : SD = 1, coherence for every private L2
- Varying SD to get better performance for different applications
- SD increased, hit rate increase
- SD of 2 or 4 is often better
- Hybrid shared/private LLC
- ways of set are distributed across slices
- Some are private while others are shared
- The blocks accessed in shared will move to private
- The evicted block will move to shared cache -> victim cache
- NuRAPID : Non-Uniform access with Replacement And placement usIng Distance associativity
- tag-lookup in centralized structure near CPU
- direct to the bank contains block
- accessed blocks are swapped with not recently touched blocks
- blocks near the core are often hot
- reduce inner-bank traffic
- Data pointer is needed for block to save entry
- Replacement policy
- may need to independently track LRU among banks
- Random selection is actually good enough
- CMP-NuRAPID
- Organized private L2
- Decouples Tag and Data Array
- Tag Arrays are private while Data Array is shared.
- Each core maintains its own private tag array
- Multicore read/write the same block in private L2 may leads to coherence miss
- L2
沒有留言 :
張貼留言