Bob Balaban's Blog


    Bob Balaban


    Geek-o-Terica 14: How Using "LimitEntries" Can Mess Up Db Replication (Unintended Side-Effects)

    Bob Balaban  February 18 2011 03:25:00 PM
    Greetings, Geeks!

    This one is going to get uber-geeky real fast, so hang onto your gaming consoles.

    We need a little background, but I'll keep it brief. Most of you probably know that whenever a document is saved (written to disk in the NSF), 2 list items automatically get updated: $UpdatedBy is a list of the names of the people who modified (and, initially, created) the document. The $Revisions item is a list of date/time values indicating the times at which the updates occurred. So, these 2 lists are "parallel", in that the last name in $UpdatedBy can be assumed to have modified the document at the last date/time in $Revisions, and so on, back into history, as far as it goes.

    Somewhere back in the V4/V5 days, people started to notice that these lists could, over time, get pretty big. A new Distinguished Name (10-30 characters) plus a new date/time value (8 bytes) each time someone hits Ctl-S in a document can start to add up. Even small documents could end up bloated by "bookkeeping data". So someone invented two new properties in the Application (nee Database) Properties box, on the propeller-hat tab: "Limit Entries in $UpdatedBy fields" and "Limit Entries in $Revisions fields". By default, these settings have a value of zero, meaning (as usual), "no limit". But, should you desire, you can set these properties to non-zero values to keep the size of your documents from growing forever. Of course, it's nice if you set the 2 properties to the same value, to keep the 2 lists in "sync", but you're not forced to (is that recommendation even documented? I didn't check).

    Ok, so far so good. But, you ask, what does this have to do with replication? Aha! Well, first just a bit more background.

    Before Notes v4, there was only 1 replication granularity: full document. The replication logic matched up the documents in 2 NSF replicas by their UNIDs (Universal IDs). If one document's "last-modified" time is later than the others, then (keeping it simple here by skipping some gritty details) it "wins". That worked fine for quite a while, but it meant that the replicator had to copy the entire document from one NSF to another when one copy of the document had been modified, and the other hadn't. If the document contained 10mb of attachment data, well, tough, the replicator copied all 10 mb, because it had no way of knowing whether the attachment itself had been modified.

    Enter "Field Level Replication", somewhere in the V4 timeframe. (By the way, you may -- or may not -- be surprised to know that "replication" was never patented by Lotus or Iris. However, Field Level Replication was patented in the early 1990s). This is an optimization applied to the normal replication logic that allows for much greater efficiency, but it required a change to the on-disk representation (ODS) of documents. The key innovation in FLR was that a new C API entry point was created that could tell a caller (for example, the Replicator) the date/time of last modification of a single DATA ITEM in a given document. This new entry point allowed the replicator to enhance it's normal logic to say something like: "Ok, DocA and DocB are the same (same UNID). DocA has a later last-modified date. But I may not have to copy the WHOLE document to DocB's NSF. I can match up the pair-o'-docs item by item, and only update DocB with the items that are modified!"

    Clearly better when you've got a big replication with lots of changes. But, how does the item-last-modified API actually work? You certainly don't want to store a full 8 bytes of date/time data for each item in a document, that's an enourmous use of space. Instead, someone figured out that they could use only 3 or 4 bits to store an INDEX into the list of document last-modified dates for each item. My recollection is (I could be wrong about this) that the number is the count from the END of the $Revisions list. So, a "1" probably means "last entry in $Revisions", etc. Cool, huh? Clever! You get huge new functionality by spending only a few bits per data item.

    BUT! What if someone's set the db property to "Limit Entries in $Revision fields"?? What if (extreme case here) there are only 2 entries allowed, but items in some document have been modified 3 or 4 times in one replica, and not at all in another? Aye, there's the rub.

    The answer is, you lose field-level replication for that document, and any others like it. The "get me the item last-modified time" logic has to account for that, of course, and there's really only one thing it can reasonably do. If the last-mod "index" attached to an item is bigger than the length of the $Revisions list, it has to fall back to document last-modified time, because the last-modified time of the document is guaranteed to be greater than, or equal to, the last-modified time of any item in that document.

    So, there you have it: an unintended consequence of limiting the amount of $Revisions space used in a document is that you make replication run slower (in some situations). "Correctness" is not violated (important point!): you never lose data as a result. It's just that in some cases replication will take longer. How much longer? It depends on your data. So, as always, it's a trade-off, space vs. time.

    Hope this helps, Geek ya later!

    (Need expert application development architecture/coding help?  Want me to help you invent directory services based on RDBMS? Need some Cloud-fu or some web services? Contact me at: bbalaban,
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2011 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.