Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    Geek-o-terica 10: "Autoupdating" in views

    Bob Balaban  April 4 2010 09:51:51 AM
    Greetings, Geeks!

    I've seen some stuff flying about recently about whether one should, or should not, disable automatic updates when iterating through the documents in a view using LotusScript or Java back-end classes, and why. Some of the points made greatly surprised me with how wrong they were, so I thought I'd give you my take on the issue.

    Background
    At the level of the Notes C API (where everything that really matters really happens), what is a view? It's a data structure that represents a snapshot of the view's index. Of course, view indexes are changing all the time, some are more volatile than others, but (in the worst case) a view can be subject to change any time a single document is created or updated in the NSF. That doesn't necessarily mean that the view gets re-indexed every time that happens, but it does mean that a "view" living in an NSF on a server, as a data structure that is shared ("used") by potentially many clients at the same time, can change out from under you while you are using it.

    So, what about the NotesView object (View in Java)? It is essentially a wrapper for a view index data structure (called an HCOLLECTION for you C API geeks). Most people use this class primarily for navigation over documents: GetFirst/NextDocument, GetNext/PreviousSibling, etc etc. Virtually all of the navigation methods in the NotesView class translate down to a single C API entry point invocation, called NIFReadEntries (the "NIF" prefix is common to many API calls having to do with views, it stands for "Notes Indexing Facility"). The job of this complex call is to take the provided "starting point", or "position"  in the view (represented by the "current" document, for example), figure out how to navigate relative to that location (next, previous, first child, whatever), and find the document in the view's index corresponding to that new location, or position, given the view's current nesting, categorization, and so on.

    This potentially complex navigation is done based on the "snapshot" of the view index currently held by the "user" (LotusScript/Java program, Notes client, remote C API program...). Now, suppose that my Java program is navigating through the documents in a view, and while it's doing that, some other program, or user, modifies the view from another client? Perhaps a new document was created in the database, and that document matches my  view's selection formula. In fact, perhaps that document just happens to sort to the top of the view, so that all of the other documents in the view have a new "position". Perhaps my own program, inside the navigation loop, makes a change to a document that I see, and re-saves that document. If that change affects the value of a column in the view, and if that column is sorted or categorized, then my current document will actually "move" to a new position in the view.

    The problem is this: when the documents in the view change position while my program is between navigational events in the view, how do I define the "next" document? Is it the document that would-have-been-next before the view changed? Or is it the document that is actually-next given what might be a new location for my "current" document? It's a puzzle!

    One good thing about the NIFReadEntries call is that when invoke it to navigate elsewhere in the view, it tells you if the view has been modified since the last time you called it. It can detect when your current "snapshot" (HCOLLECTION) of the view gets out of sync with the "real" view index on the server. Note that there is NO WAY to prevent the view from being changed, if someone or something somewhere modifies the view contents, it just changes out from under you.

    However, the implementation of the NotesView/View class gives a choice about what to do about this situation, in the form of the "AutoUpdate" property. This property is On by default, here's the logic it follows, for example, in the GetNextDocument() call:
    1. Find the position of the "current" document (it's cached in the document object during navigation)
    2. Invoke NIFReadEntries with the current position and the various navigational options set to get the "next" document
    3. Read the NOTEID and position of the next document (if there is one) returned by the call
    4. Examine the "dirty" flag returned by NIFReadEntries to detect whether the view has changed since our last call to it
    5. If the view has changed, and if the view object "autoupdate" flag is set, then
    6. Update our HCOLLECTION against the view index (this might, or might not force a re-indexing operation on the view)
    7. Re-compute the position of the "current" document in the updated index (it might, or might not have moved)
    8. Call NIFReadEntries again to navigate

    If the "AutoUpdate" flag is off, then steps 6, 7, and 8 are skipped.

    This is a powerful feature, though frankly, I nearly always turn it off when navigating views. From a programmatic point of view, the danger of having it on is that you can easily either miss documents in a view, or visit the same document multiple times, if you're not careful.

    As this post is already pretty long, I'll do a Part Deux posting exploring the ins and outs of AutoUpdate soon.

    Geek ya later!

    (Need expert application development architecture/coding help? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2010 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.
    Comments

    1Jens Peters  4/4/2010 5:40:12 AM  Geek-o-terica 10: Autoupdating in views

    Thanks for this! I never understood if and when I have to change AutoUpdate back to the default after I read a view. Your post sounds like this flag is always set back to the default when I recycle a view and refetch it.

    2Fred Janssen  4/4/2010 5:40:31 AM  Geek-o-terica 10: Autoupdating in views

    Thanks for the info Bob!

    3Bob Balaban  4/4/2010 12:48:59 PM  Geek-o-terica 10: Autoupdating in views

    @Jens - Yes, whenever you get a new View object instance, AutoUpdate will be False

    4Erik Brooks  4/4/2010 4:26:22 PM  Geek-o-terica 10: Autoupdating in views

    @1/@3 - The default for AutoUpdate is True, so getting a new handle on a view it will already be set. You'll need to set it to False each time you obtain the handle, if that's the behavior you want.

    There's obviously specific cases where either True or False is the obvious choice but I argue that AutoUpdate=True is generally a more desirable behavior in the case of a standard view walk (e.g. an agent that walks a view perfoming processing on docs.) which is probably 95% of the cases where people care. Here's why:

    With AutoUpdate = True you will only run the risk of skipping documents (or re-processing them) if you are actively modifying them in such a manner that you are effectively moving them around within the view index as you go.

    With AutoUpdate = False you run the above risk (though in slightly different ways) and ALSO the added risk of skipping/reprocessing documents because they've been moved around by *other* documents getting moved/added/removed in the view.

    Thankfully Bob your algorithm is in-line with my latest talks with Lotus Support, meaning that the info they gave me a year ago about AutoUpdate=False causing all other threads to go into a semaphore lock was wrong. While it stinks that there's not an atomic way to get a true snapshot-in-time of a full view (well, there's @DbColumn and @DbLookup, but there's obvious limitations there) I'd much rather be missing that capability than knowing that using AutoUpdate=False would cause threads to spin.

    Oh, and for those curious, HCOLLECTION is technically a (H)andle to a collection. The C API is fully of Hxxxxxx things which are all handles to various objects. I definitely recommend that every serious N/D developer scan the toolkit documentation at some point. It can come in very handy when debugging NSD crash stacks, optimizing high-performance algorithms (especially if you crack open notes.jar, but don't do that) and generally helping you to understand various low-level things in Notes.

    At the very least you will realize that Ben Langhinrichs has the patience of a saint to have made the progress that he has with rich text and his Midas LSX toolkit. :-)

    5Bob Balaban  4/4/2010 11:49:00 PM  Geek-o-terica 10: Autoupdating in views

    @4 - Thanks for the comments, Erik. I think AutoUpdate=false IS the "static" snapshot of the view that you are looking for. It says, effectively, "Don't update my version of the view index, no matter what happens in the underlying 'real' view". I plan to go into more detail about this in my next Geek-o-terica post

    6Erik Brooks  4/5/2010 1:10:35 AM  Geek-o-terica 10: Autoupdating in views

    @5 - "I think AutoUpdate=false IS the "static" snapshot of the view that you are looking for"

    I don't think this is quite accurate. AutoUpdate=False simply means that your cache of the current document's position will be used to determine the "next" document, regardless of the view being dirty or not. It's not a static version of the entire view index unless I'm seriously misunderstanding things.

    N/D, when reading a view, reads in pages. I.E. if you have a 300MB view index it doesn't read the entire thing into memory, only the page(s) needed as you work. This is why a simple NotesViewNavigator walk of entries (the entries, not the documents) will cause occasional blips of the lightning bolt for network traffic -- it's pulling down page after page as you need them.

    You can try it your self by simply opening a ;arge (5000+ doc) view in your Notes Client, going to the top and clicking on the down-arrow to scroll. Blip, blip, blip... those are pages being loaded as-needed.

    Now picture a scenario where, as you scrolled, some doc somewhere in the view got changed. You'd get the "Refresh" circular-arrow thing in the upper-left corner of the view.

    AutoUpdate=True will effectively click the refresh button whenever it needs to prior to getting the next document. It then re-finds the "current" document (it better still be in the view!) and then proceeds. If documents were removed from the view below your current position, no problem, you won't process them since the refresh will make sure they get handled accordingly. If documents are added above your current position, again no problem -- as they "bump" your original farther down in the view the refresh will cause your code to re-find the document's new position and keep going from there.

    AutoUpdate=False ignores the refresh button, it simply uses the current document's last-known position from the page of the view that was cached and proceeds. This position may now be completely wrong if documents before that position have been inserted or removed. And things get really unpredictable when a new page is needed, because it certainly didn't have a cache of *that*.

    That scenario of needing a new page is where things get really screwy with AutoUpdate=False. If you've got a 300MB view index then even with AutoUpdate=False you aren't going to get a snapshot of the entire index but only a snapshot of the page you were working with. So on the pull of the next page you're always getting the most-up-to-date data for that page even if you were working with out-of-date data on the prior. I don't remember what the default N/D page size is off the top of my head - 64K perhaps?

    Again, all this applies to working with documents. With the NotesViewNavigator class if you're working with category rows then you should always use AutoUpdate=False because AutoUpdate=True has a really hard time finding category rows that have moved (unless you want to trap the error it throws and implement your own logic to re-find the "current" entry.)

    Definitely looking forward to your upcoming posts, keep `em coming!

    7Chris Hart  4/5/2010 10:21:12 AM  Geek-o-terica 10: Autoupdating in views

    Bob, thanks for the info. This level of detail is really interesting, and I'm looking forward to your next entry.

    @6 - Erik, I've always thought of AutoUpdate=False as providing as a static snapshot of the entire view index. Have you seen and/or can you reproduce a case where document navigation is "unpredictable" with AutoUpdate=False?

    8Giulio  4/5/2010 12:35:21 PM  Geek-o-terica 10: Autoupdating in views

    Bob what a great performance tip. Can you give us any idea of the performance hit for views with 10,000 documents where auto-update is on vs off ? I'd never considered that autoupdate would impact read operations of views in this way.

    Good to know if you need to process big views during quiet times, say nightly agents, or where the risk of updated documents during processing in the view is not an issue.

    9Erik Brooks  4/5/2010 12:52:39 PM  Geek-o-terica 10: Autoupdating in views

    @7 - Think about what would be required to provide a static snapshot of the entire view index. There would either need to be:

    (A) some historical tracking of what was/wasn't in the view at the time the handle was obtained. This would be very complex algorithm-wise and cause massive server I/O activity and CPU as the view was processed.

    (B) a "snapshot" of the entire view index temporarily spun off for the thread that requested AutoUpdate=False. This would mean that there would be gobs of memory consumed as soon as your AutoUpdate=False line was executed (got a 300MB view index? here comes your 300MB snapshot...) which would be a no-go for most server-side code, especially XPages and web agents. In the case of client-run code the sheer amount of network traffic from the server doesn't lend itself well to that solution either.

    (C) A semaphore lock placed on all threads wanting to update/refresh the view in any way until the thread that set AutoUpdate=False either released its view handle or set AutoUpdate back to True on the view.

    The documentation on this stuff is not all that great, hence why I opened a ticket with Lotus Support about a year ago and was told that the answer was (C). Apparently *their* documentation isn't that good either. Of course, the core view update code is, what - 15+ years old? It's no surprise there's not many people around who truly know how it works.

    But I'd always suspected that (C) wasn't the case, and based on the discussions I've had with IBM about the recent GDBK() problems the prior-version algorithms work as Bob described. The current (regression-bugged) algorithms work the same way but error after 10 refresh attempts. The new fix will work similarly but is slightly different - I'll discuss that more as things progress with my dealings with Lotus Support.

    10Jens Bruntt  4/6/2010 4:45:24 AM  Semaphore locks

    @9 About 1┬Ż years ago we had a lengthy discussion with Lotus support about semahore locking of views.

    This did not specifically have to do with server side views accessed using lotusscript or java, but was generally about accessing server side views.

    We ended up with a concept of contention around views: If I have some code that accesses a tainted view (a view in need of an update), be it just browsing the view through the client or by accessing it using LotusScript or Java, the server process I am accessing for the resource may decide to do the update.

    If the server process does decide to update the view, my code will wait for the update to finish before it can complete.

    What will also happen is that any other piece of code (or a client browsing the view) that accesses the same view while the indexing that I caused to happen is carried out will also be put on hold until the indexing is is done.

    There are a number of worker threads on the Domino server that can handle requests from clients or agents. Each worker can handle multiple requests. But every time a worker has to put a request on hold because the answer has to wait for indexing to complete, the open request occupies one of a limited number of concurrent worker threads.

    If the application has a tendency to use few views that get updated very often, you risk ending up in a situation where all the worker threads on the server have all their worker threads filled up with open requests for the same semaphore locked view.

    What will happen is that the server seems to be heavily loaded - from a Notes Client point of view. But when looking at the server load indicators, it's doing next to no work (I/O, CPU and network use are looking like no users are on line). One process of course IS working - the process that is indexing the view.

    When the view is done indexing, the users will suddenly see that "the server started working again".

    11Erik Brooks  4/6/2010 9:49:01 AM  Geek-o-terica 10: Autoupdating in views

    @10 - You're absolutely correct. If an index update is underway then everything else wanting something from that view will spin until the indexing is done.

    The only possible way to circumvent this may be with AutoUpdate=False enabled, in which case that thread may, under limited circumstances, be able to use its cache of the current page intstead. Though that may obviously be out-of-date info.

    12Morten Clausen  4/11/2010 8:06:25 PM  Geek-o-terica 10: Autoupdating in views

    @10 and 11 (I'm a colleague of Jens): I've just one thing to add. When we determined that this was the case we changed all code to use AutoUpdate = False and it had quite an impact on performance (this in an application with 500.000 documents/database and a thousand quite active users spread across 10 databases). This was weighed against the risk of presenting slightly off reports (the operation most at risk in this application) and we quickly decided that the cost of a few support calls when it eventually did fail did not outweigh the cost of the entire server flatlining - one of the easiest design decisions we've ever had to make. :-)

    The funny thing is, we've never had any support calls so the risk for us is obviously pretty low. Either our code doesn't fail or the users don't notice (or don't care) when it does. YMMV.