2013年4月29日 星期一

Note_Book_Multi-Core_CacheHierarchies-Ch2

2.1 Data Management for a Large shared NUCA Cache 

  • NUCA
    • Static : long acces time
    • Dynamic : time a little less, increased complexity, more power
  • local region(of CPU) : private data
  • center region : shared data
  • Block placement: random (based on tag)
  • Problem : locating a block
  • sol: multicast partially - > if miss :multicast remained bank
    •     fast but with heavy network and bank load
  • Another sol: partial tag by kim
  • Problem of above:
    • high overhead of partial tag
    • block is hit only in right bank
  • Sharing Degree
    • Completely Shared : SD = 16, coherence only in L1
    • No Shared : SD = 1, coherence for every private L2 
    • Varying SD to get better performance for different applications
    • SD increased, hit rate increase
    • SD of 2 or 4 is often better
  • Hybrid shared/private LLC
    • ways of set are distributed across slices
    • Some are private while others are shared
    • The blocks accessed in shared will move to private
    • The evicted block will move to shared cache -> victim cache
  • NuRAPID : Non-Uniform access with Replacement And placement usIng Distance associativity
    • tag-lookup in centralized structure near CPU
      •  direct to the bank contains block
    • accessed blocks are swapped with not recently touched blocks
      •  blocks near the core are often hot
    • reduce inner-bank traffic
    • Data pointer is needed for block to save entry
    • Replacement policy
      • may need to independently track LRU among banks
      • Random selection is actually good enough
  • CMP-NuRAPID
    • Organized private L2
    • Decouples Tag and Data Array
    • Tag Arrays are private while Data Array is shared.
    • Each core maintains its own private tag array
      • May lower the capacity
    • Multicore read/write the same block in private L2 may leads to coherence miss
    • L2

       

沒有留言 :

張貼留言