Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    Geek-O-Terica 15: Easy conversion of Notes documents to MIME format (Part 1)

    Bob Balaban  March 21 2011 05:00:00 AM
    Greetings, Geeks!

    MIME is a data format that has become central to transmission of email over the Internet. The nice thing about it is that everyone uses it for mail interchange, and that it's standard. The Domino server converts incoming MIME-formatted messages into Notes documents, and outgoing Notes email documents into MIME formats. As of (I think) Notes v6, you can specify that you want incoming MIME messages to remain in their native format within the NSF database. In these cases, the Notes client will automatically convert any document that resides in MIME format to "regular" Notes rich text document format when you open it in the client UI.

    What does MIME format look like? It can get complicated, but the easiest way to think of it is as a sectioned box, with each compartment within the box holding data in a specified format. So, plain text is just a plain-text section. "Rich text" is usually represented as HTML, with embedded "pointers" to other sections containing embedded objects, such as images or attachments. Other data, such as file attachments, reside in their own sections, with their own "headers" specifying size, encoding (e.g., base-64) and format (e.g., JPEG or GIF). The different sections in the MIME file are separated by "boundaries", essentially a unique string. There's much more to it, of course, but these are the basics.

    What if you want to write script (LotusScript, Java, other) to programmatically convert Notes documents to MIME format? If you're using a recent (v8.5x) version of Notes or Domino, then most of the work is done for you with new methods in the back-end classes. This functionality is, of course, based on entry points in the Notes C API. If you're not using LotusScript or Java, you can accomplish MIME translation with  a C or C++ program using these entry points, or, even easier, use the COM classses. I'll discuss how to do MIME conversion using the C and COM APIs in Part Deux of this post.

    Here's a basic Java program showing the essential techniques for conversion. I'm leaving out all the surrounding code for acquiring Document objects, preparing FileOutputStreams, and so on. There is only one slightly tricky thing about this program, which we'll get to after this first section.

    Session session = NotesFactory.createSession();
    // turn off automatic mime conversion on document open
    // if doc is already in MIME, leave it so
    session.setConvertMIME(false);
    Document doc = .... // get document somewhere
    // kill any $KeepPrivate items
    doc.removeItem("$KeepPrivate");
    doc.convertToMIME(lotus.domino.Document.CVT_RT_TO_HTML, 0);   // note: Designer doc has wrong spelling
    WriteOutputMIME(doc);

    So far, so good. We suppress automatic conversion on document open, to save work (if the document is already in MIME format, we can skip the convert step and just write it out). If the "$KeepPrivate" item is present in the document, conversion will fail, so we remove that. Then all we do is call the Document.convertToMIME() method, specifying that we want rich text converted to HTML.

    After the convert call, the document in memory is now a sequence of items containing MIME headers, and a (possibly multi-part) body representing the rich text body of the original document, plus any attachments it contains. We can (almost) proceed to iterate over these items and write them out to disk (or wherever).

    I say "almost", though, because there's a glitch in the conversion code deep underneath the C API layer: it does not automatically convert attachment contents to a base-64 encoding (even though the API documentation says it will) - it leaves them in binary format, which cannot be written to disk. So, we have to look for those items, and force them to be converted to base-64 text. This next section of Java code for the WriteOutputMIME() functions shows how to do that. Again, I've pared the code down to the essential bits:

          private void WriteOutputMIME(Document doc, File outDir)
          throws Exception
    {
          File outFile = null;
          MIMEEntity mE = null;
          MIMEEntity mChild = null;
          String contenttype = null;
          String headers = null;
          String content = null;
          String preamble = null;
          int encoding;
          FileWriter output = null;
          String noteid = doc.getNoteID();
          int index;
         
          // access document as mime parts
          mE = doc.getMIMEEntity("Body");
          outFile = new File(outDir, noteid + ".eml");
          output = new FileWriter(outFile);
         
          try {
                  contenttype = mE.getContentType();
                  headers = mE.getHeaders();
                  encoding = mE.getEncoding();
                 
                  // message envelope. If no MIME-version header, add one
                  index = headers.indexOf("MIME-Version:");
                  if (index < 0)
                          output.write("MIME-Version: 1.0\n");
                  output.write(headers);
                 
                  // for multipart, usually no main-msg content
                  content = mE.getContentAsText();
                  if (content != null && content.trim().length() > 0)
                          {
                          output.write(content);
                          output.write("\n");
                          }

              // For multipart, examine each child entity,
              // re-code to base64 if necessary                
                  if (contenttype.startsWith("multipart"))
                          {
                          preamble = mE.getPreamble();
                          mChild = mE.getFirstChildEntity();
                          while (mChild != null)
                                  {
                                  headers = mChild.getHeaders();
                                  encoding = mChild.getEncoding();
                                 
                                  // convert binary parts to base-64
                                  if (encoding == MIMEEntity.ENC_IDENTITY_BINARY)
                                          {
                                          mChild.encodeContent(MIMEEntity.ENC_BASE64);
                                          headers = mChild.getHeaders(); // get again, because changed
                                          }
                                 
                                  preamble = mChild.getPreamble();
                                  content = mChild.getBoundaryStart();
                                  output.write(content);
                                  if (!content.endsWith("\n"))
                                          output.write("\n");
                                  output.write(headers);
                                  output.write("\n");
                                 
                                  content = mChild.getContentAsText();
                                  if (content != null && content.length() > 0)
                                          output.write(content);
                                  output.write(mChild.getBoundaryEnd());
                                 
                                  mChild = mChild.getNextSibling();
                                  } // end while
                          } // end multipart
                 
                  // end of main envelope
                  output.write(mE.getBoundaryEnd());
                  }
          finally {
                          if (output != null)
                                  output.close();
                          }
         
    } // end WriteOutptuMIME

    So, a little tricky, but not too bad. You have to get the boundaries right, as well as the line breaks. Otherwise, it's really just copying stuff out to disk. Remember that the message has some overall headers (mE.getHeaders()), and each child entity has its own header section as well, describing what's in that chunk of data. When we re-code an entity from binary format to base-64 format, we need to re-read the entity headers, because they'll reflect that change.

    A final comment about HTML conversion: it theoretically existed back in R5 (you know what they say about "in theory"...), but it didn't start working well for real until v7.03. And it has been improving ever since, so the later the version of the product you have, the better off you'll be.

    In my next blog post (part deux), I'll show you how to adapt this basic code for the Notes COM classes, where (for some stupid reason) the Document.convertToMIME() function does not exist). We will not be thwarted! I'll show you how to use the Notes C API from a C-sharp program to do the MIME conversion.

    Happy coding! Geek ya later!

    (Need expert application development architecture/coding help?  Want me to help you invent directory services based on RDBMS? Need some Cloud-fu or some web services? Contact me at: bbalaban, gmail.com)
    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2011 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.