cache entry replacement algorithm

I have a software project that creates a series of fingerprint (hash) values from objects of varying size. The larger the object size, of course, the more expensive the computation of the hash. The hashes are used for comparative purposes.

I now wish to cache hash values in order to improve performance of subsequent comparisons. For any given entry in the cache, I have the following metrics available:

  • hit count
  • last modification date/time
  • size of object hashed

So on to my question. Given the need to constrain the size of the cache (limit it to a specific number of entries), what is a well-balanced approach to replacing cache items?

Clearly, larger objects are more expensive to hash so they need to be kept around for as long as possible. However, I want to avoid a situation where populating the cache with a large quantity of large objects will prevent future (smaller) items from being cached.

So, based upon the metrics available to me (see above), I'm looking for a good general-purpose "formula" for expiring (removing) cache entries when the cache becomes full.

All thoughts, comments are appreciated.

-------------Problems Reply------------

You need to think about the nature of the objects. Think about the probability of the objects to be called again soon. And remove the least likely object.

This is very specific to the software you're using and the nature of the objects.
If they are used continuously in the program they will probably abide to the Locality of reference principle. So you should use LRU (Least recently used) algorithm.

If objects with higher hit count are more likely to be called again, then use that (and remove the lowest).

Take a look at Cache Algorithms

In principle, you need to calculate:

min(p*cost)

p = probability to be called again.
cost = The cost of caching that object again.

Assuming the ability to record when an entry was last accessed, I'd go with a "Cost" for each entry, where you at any time remove the least expensive entry.

Cost = Size * N - TimeSinceLastUse * M

Presuming you completely remove entries from the cache (and not keep old hitcount data around) I'd avoid using hitcount as a metric, you'd end up with an entry that has a high hitcount because it's been there for a long time, and it'll be there even longer because it has a high hitcount.

I typically use a strict least recently used (LRU) scheme for discarding things from the cache, unless it's hugely more expensive to reconstruct some items. LRU has the benefit of being trivially simple to implement, and it works really well for a wide range of applications. It also keeps the most frequently used items in the cache.

In essence, I create a linked list that's also indexed by a dictionary. When a client wants an item, I look it up in the dictionary. If it's found, I unlink the node from the linked list and move it to the head of the list. If the item isn't in the cache, I construct it (load it from disk, or whatever), put it at the head of the list, insert it into the dictionary, and then remove the item that's at the tail of the list.

Might want to try a multilevel style of cache. Dedicate a certain percentage of the cache to Expensive to create items and a portion to easy to create but more heavily accessed items. You can then use different strategies for maintaining the expensive cache than you would the less expensive one.

The algorithm could consider the cost of reproducing a missing element. That way you would keep the most valuable items in the cache.

Category:c# Views:1 Time:2011-03-11

Related post

  • Page replacement algorithm simulation in Java 2010-05-31

    Is there utility program for Page replacement algorithm simulation in Java? --------------Solutions------------- No. Java abstracts away the concrete memory management, so there should seldom be a need for this. Edit: Think some more seconds. No, the

  • Is there any way to avoid duplicate page cache entries while allowing for case-insensitive URLs? 2009-01-13

    If I have a page with outputcaching (let's call it Employees.aspx) which accepts one parameter (through querystring) called Company like this: http://www.example.com/Employees.aspx?Company=Google How can I avoid duplicate page cache entries for diffe

  • How to detect and debug stale cache entries? 2009-01-29

    I am using memcached with PHP trying for heavy caching to avoid db-reads. I am invalidating the cache on an update (application invalidates). But, stale cache data is becoming a big issue. Mostly, this is due to bug in invalidation (invalidates wrong

  • How to clear APC cache entries? 2009-05-26

    I need to clear all APC cache entries when I deploy a new version of the site. APC.php has a button for clearing all opcode caches, but I don't see buttons for clearing all User Entries, or all System Entries, or all Per-Directory Entries. Is it poss

  • java.lang.RuntimeException: ERROR: Failed to recover corrupt cache entry 2010-07-16

    I just had this error message from one of my users. (IE8, Java 1.6.20 ). It is from an applet which receives instructions from Javascript and executes certain processes on the client. RangeError java.lang.RuntimeException: ERROR: Failed to recover co

  • Should I remove a cache entry which I didn't put there? 2010-09-12

    Precondition: There's a web application that leverages ASP.NET security model. There's also an Active Directory (AD) integration component. It provides AD users and roles as if those are application's own users and roles. The relations like "is in ro

  • Programatically clear a single DNS cache entry in Windows and Linux in C++ 2010-11-05

    I am wondering if there is a way to programmatically clear a single DNS cache entry in both Windows and Linux. Or if there is some other way to force a gethostbyname call to not use the local cache. Clearing the entire cache would not be ideal. Thank

  • Problems with: A soft-locked cache entry was expired by the underlying Ehcache 2011-01-02

    I'm getting following warning and I have no idea what to do about it. There is about 80000 entries that write this warning into catalina.out log file in tomcat every time the bannedIPs are getting updated: WARNING: Cache package.BannedIP Key package.

  • What does the cryptic GC cache entry mean 2011-01-11

    Time to time, I receive this strange warning message. It is usually gone on page reload. What does that mean. I googled but to no avail. Warning: include(): GC cache entry '/.../...class.php' (dev=2049 ino=37120489) was on gc-list for 3840 seconds in

  • A soft-locked cache entry was expired by the underlying Ehcache 2011-04-15

    Hibernate 3.3.x, ehcache 2.2.x The following error occurs, when I try to publish a lots of users in a single go. Any idea on why this would happen and how to rectify this? Is there a way to disable this cache prior to bulk loading of users, if so how

  • page replacement algorithm 2011-05-04

    What is the name for the page replacement algorithm for 2.6 linux kernel? --------------Solutions------------- Linux calls it the "Page Frame Reclaiming Algorithm". In my limited understanding, it's basically LRU with a bias towards non-dirty pages.

  • How to make a cache entry in websphere not to expire 2011-06-29

    I have created a cache in the web sphere which will be shared by multiple applications and i wanted to make one entry in the created cache to not to expire. How can i make it ? Thanks and Regards, Sunny. --------------Solutions------------- Are you d

  • Cache entry not in use 2012-02-24

    We have a web application using the following technologies: JSF 2.0, EJB 3.1, JPA 2.0, JBoss AS 7.1 Final Sometimes we get the following exception out of nowhere: 09:46:29,664 ERROR [org.jboss.ejb3.invocation] (http-10.99.0.10-10.99.0.10-8080-14) JBA

  • Can we obtain LRU (least recently used) page replacement algorithm in O(1)? 2012-04-11

    Can we obtain LRU (least recently used) page replacement algorithm in O(1) (i.e constant time)? Please give the algorithm if possible. --------------Solutions------------- A doubly-linked list can implement a LRU queue with O(1) operations. Used node

  • PHP Apc : how to view cache entries in APC 2012-04-14

    I have php 5.3.6-13ubuntu3.6. I installed apc by: $ sudo apt-get install php-apc ( though i found many other methods later). I want to see my cache entries and other info. I googled and found finding and placing apc.php file in /var/www/ will do what

  • What is the page replacement algorithm used in windows7? 2012-01-26

    I am looking for the page replacement algorithm used in Windows 7? is it still FIFO or LRU? --------------Solutions------------- Hello Abdullah, The issue you have posted would be better suited on the Microsoft TechNet support forum. I would suggest

  • Which Type of Page Replacement Algorithms the Windows 7 is use ?? 2014-09-07

    Hi everybody, Every year we heard about something new that MICROSOFT manufacturing it, and every year we learned something new in MICROSOFT products. Since we have now the newest version of windows which is Windows 7, and as I am a computer science s

  • String Find/Replace Algorithm 2009-10-30

    I would like to be able to search a string for various words, when I find one, i want to split the string at that point into 3 parts (left, match, right), the matched text would be excluded, and the process would continue with the new string left+rig

  • Mongrel CACHE log entries, specifically CACHE entries for SQL statements 2010-08-09

    In the process of looking at my logs from Mongrel, I found some SQL statements that I wanted to optimize. While looking into these, I noticed that these entries sometimes have CACHE in front of them, e.g.: CACHE (0.0ms) SELECT * FROM `customers` WHER

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.083 (s). 11 q(s)