EPUB plugin

From AbiWiki

Revision as of 20:22, 26 August 2011 by DaleDe (Talk | contribs)
Jump to: navigation, search

Contents

Overview

EPUB is an open standard for electronic publications. It declares set of features that reading system must implement to render all documents and describes structure and contents of electronic publication.

EPUB 2.0.1

Current version of EPUB standard is 2.0.1 (work in progress on EPUB3). It consist of three specifications:

  • Open publication structure (OPS)
  • Open packaging format (OPF)
  • Open container format (OCF)

You can find more details at the official EPUB website(http://idpf.org/epub/201). In brief, OPS defines in what way the content of publication is stored(HTML/XHTML and CSS) and what file types reading system should always support(so called "Core media types"); OPF defines what files are obligatory and describes their contents.

EPUB 3

EPUB 3 standard is the newest version of EPUB specification and currently is in the state of IDPF Proposed Specification. This new generation standard for electronic publications supports such things as media content(called Media Overlays), embedding MathML into documents, using JavaScript and HTML5 to create really nice-looking electronic publications. And Abiword is one of the first software that allow creating such documents (of course, they won`t contain media overlays, but who knows, maybe in the nearest future abiword will allow users to embed video and audio in the documents).

Using HTML exporter to create another plugin on top of it

In this section I`ll describe how it`s possible to create custom plugin based on HTML exporter without no changes to HTML exporter itself. To describe the process I`ll use references to existing code - EPUB2 export code.

Creating OPS documents

To create OPS part of EPUB 2.0.1 document(which is HTML/XHTML) we`ll use HTML exporter. It will generate all needed files in some temporary location and we`ll use it to create package. Everything we need is to tell HTML plugin where to place files. Let`s see at IE_Exp_EPUB::_writeDocument() to understand process of exporting in general:


   UT_Error errOptions = doOptions();
   if (errOptions == UT_SAVE_CANCELLED) //see Bug 10840
   {
       return UT_SAVE_CANCELLED;
   }
   else if (errOptions != UT_OK) {
       return UT_ERROR;
   }
   m_root = gsf_outfile_zip_new(getFp(), NULL);
   if (m_root == NULL)
   {
       UT_DEBUGMSG(("ZIP output is null\n"));
       return UT_ERROR;
   }
   m_oebps = gsf_outfile_new_child(m_root, "OEBPS", TRUE);
   if (m_oebps == NULL)
   {
       UT_DEBUGMSG(("Can`t create oebps output object\n"));
       return UT_ERROR;
   }
   // mimetype must a first file in archive
   GsfOutput *mimetype = gsf_outfile_new_child_full(m_root, "mimetype", FALSE,
           "compression-level", 0, NULL);
   gsf_output_write(mimetype, strlen(EPUB_MIMETYPE),
           (const guint8*) EPUB_MIMETYPE);
   gsf_output_close(mimetype);
   // We need to create temporary directory to which
   // HTML plugin will export our document
   m_baseTempDir = UT_go_filename_to_uri(g_get_tmp_dir());
   m_baseTempDir += G_DIR_SEPARATOR_S;
   // To generate unique directory name we`ll use document UUID
   m_baseTempDir += getDoc()->getDocUUIDString();
   // We should delete any previous temporary data for this document to prevent
   // odd files appearing in the container
   UT_go_file_remove(m_baseTempDir.utf8_str(), NULL);
   UT_go_directory_create(m_baseTempDir.utf8_str(), 0644, NULL);
   if (writeContainer() != UT_OK)
   {
       UT_DEBUGMSG(("Failed to write container\n"));
       return UT_ERROR;
   }
   if (writeStructure() != UT_OK)
   {
       UT_DEBUGMSG(("Failed to write document structure\n"));
       return UT_ERROR;
   }
   if (writeNavigation() != UT_OK)
   {
       UT_DEBUGMSG(("Failed to write navigation\n"));
       return UT_ERROR;
   }
   if (package() != UT_OK)
   {
       UT_DEBUGMSG(("Failed to package document\n"));
       return UT_ERROR;
   }
   gsf_output_close(m_oebps);
   gsf_output_close(GSF_OUTPUT(m_root));
   
   // After doing all job we should delete temporary files
   UT_go_file_remove(m_baseTempDir.utf8_str(), NULL);
   return UT_OK;

Let`s look at the IE_Exp_EPUB::writeStructure() method. It checks what version of EPUB is used an if it`s EPUB2 version, IE_Exp_EPUB::EPUB2_writeStructure() is called.

Temporary directories are created with following code:

   m_oebpsDir = m_baseTempDir + G_DIR_SEPARATOR_S;
   m_oebpsDir += "OEBPS";
   UT_go_directory_create(m_oebpsDir.utf8_str(), 0644, NULL);
   UT_UTF8String indexPath = m_oebpsDir + G_DIR_SEPARATOR_S;
   indexPath += "index.xhtml";

Next step is export process itself:

   // Exporting document to XHTML using HTML export plugin 
   char *szIndexPath = (char*) g_malloc(strlen(indexPath.utf8_str()) + 1);
   strcpy(szIndexPath, indexPath.utf8_str());
   m_pie = new IE_Exp_HTML(getDoc());
   m_pie->suppressDialog(true);
   m_pie->setProps(
           "embed-css:no;html4:no;use-awml:no;declare-xml:yes;mathml-render-png:yes;split-document:yes;add-identifiers:yes;");
   m_pie->writeFile(szIndexPath);
   g_free(szIndexPath);

All settings that are needed for export are passed to HTML exporter using IE_Exp_HTML::setPros call. You can find descriptions of all available options in IE_Exp_HTML.cpp source file. Because we exactly know structure of exported document (.xhtml files in OEBPS dir and all files that are referenced from XHTML are in index.xhtml_files directory) we can create .opf file that contains list of all resources used in EPUB document.

Changing default behaviour of HTML exporter to use it in other plugin

While for export to EPUB2 HTML exporter contained all needed functionality and had support of XHTML, EPUB 3 requires using XHTML5 and have support of MathML, so we need redefine default behaviour of HTML exporter plugin. But don`t be afraid - it`s easy and funny.

Creating custom document writer

Though XHTML5 differs from XHTML we still can utilize IE_Exp_HTML_DocumentWriter as a base class for out new document writer. We just need to make few changes to it.

Overriding methods

First of all we need to add needed profiles to html tag:

void IE_Exp_EPUB_EPUB3Writer::openDocument()
{
   m_pTagWriter->openTag("html", false, false);
   m_pTagWriter->addAttribute("xmlns", "http://www.w3.org/1999/xhtml");
   m_pTagWriter->addAttribute("profile", EPUB3_CONTENT_PROFILE);
}

After that we can use special epub:type tag to make hints to the reading system and tell it if we want create such things as annotations, footnotes, endnotes, etc. Also we need to redefine annotation generation code:

void IE_Exp_EPUB_EPUB3Writer::openAnnotation()
{ 

m_pTagWriter->openTag("a", true); m_pTagWriter->addAttribute("href", UT_UTF8String_sprintf("#annotation-%d", m_iAnnotationCount + 1).utf8_str());

m_pTagWriter->addAttribute("epub:type", "annoref");

}

And table of contents insertion code:

void IE_Exp_EPUB_EPUB3Writer::insertTOC(const gchar *title, 

const std::vector<UT_UTF8String> &items, const std::vector<UT_UTF8String> &itemUriList)

{
}

Yes, eveything is ok, it`s empty. We need to ignore TOC insertions because we`ll have one global navigation file defined by the EPUB specification and there is no need to create another one inside the document. Now few changes left:

 void IE_Exp_EPUB_EPUB3Writer::insertEndnotes(

const std::vector<UT_UTF8String> &endnotes)

 {

if (endnotes.size() == 0) return;

m_pTagWriter->openTag("aside"); m_pTagWriter->addAttribute("epub:type", "rearnotes");

    for (size_t i = 0; i < endnotes.size(); i++)
    {
        m_pTagWriter->openTag("section");
        // m_pTagWriter->addAttribute("class", "endnote_anchor");
        m_pTagWriter->addAttribute("id", UT_UTF8String_sprintf("endnote-%d", 
            m_iEndnoteAnchorCount + 1).utf8_str());

m_pTagWriter->addAttribute("epub:type", "rearnote");

        m_pTagWriter->writeData(endnotes.at(i).utf8_str());
        m_pTagWriter->closeTag();
        m_iEndnoteAnchorCount++;
    }

m_pTagWriter->closeTag();

 }
 void IE_Exp_EPUB_EPUB3Writer::insertFootnotes(

const std::vector<UT_UTF8String> &footnotes)

 {

if (footnotes.size() == 0) return;

m_pTagWriter->openTag("aside"); m_pTagWriter->addAttribute("epub:type", "footnotes"); for (size_t i = 0; i < footnotes.size(); i++) { m_pTagWriter->openTag("section"); // m_pTagWriter->addAttribute("class", "footnote_anchor"); m_pTagWriter->addAttribute("id", UT_UTF8String_sprintf("footnote-%d", i + 1).utf8_str()); m_pTagWriter->addAttribute("epub:type", "footnote"); m_pTagWriter->writeData(footnotes.at(i).utf8_str()); m_pTagWriter->closeTag(); } m_pTagWriter->closeTag();

 }
 void IE_Exp_EPUB_EPUB3Writer::insertAnnotations(

const std::vector<UT_UTF8String> &titles, const std::vector<UT_UTF8String> &authors, const std::vector<UT_UTF8String> &annotations)

 {

m_pTagWriter->openTag("section");

    m_pTagWriter->addAttribute("epub:type", "annotations");
   
    for(size_t i = 0; i < annotations.size(); i++)
    {
        UT_UTF8String title = titles.at(i);
        UT_UTF8String author = authors.at(i);
        UT_UTF8String annotation = annotations.at(i);
       
        m_pTagWriter->openTag("section");
        // m_pTagWriter->addAttribute("class", "annotation");

m_pTagWriter->addAttribute("epub:type", "annotation");

        m_pTagWriter->addAttribute("id", UT_UTF8String_sprintf("annotation-%d", 
            i + 1).utf8_str());
        if (title.length())
        {
            m_pTagWriter->openTag("h4");
            m_pTagWriter->writeData(title.utf8_str());
            m_pTagWriter->closeTag();
        }
       
        /*if (author.length())
        {
            m_pTagWriter->openTag("span");
            m_pTagWriter->addAttribute("class", "annotation-author");
            m_pTagWriter->writeData(author.utf8_str());
            m_pTagWriter->closeTag();
            m_pTagWriter->openTag("br", false, true);
            m_pTagWriter->closeTag();
        }*/
        
        if (annotation.length())
        {
            m_pTagWriter->openTag("blockquote");
            // m_pTagWriter->addAttribute("class", "annotation-content");
            m_pTagWriter->writeData(annotation.utf8_str());
            m_pTagWriter->closeTag();
        }
       
        m_pTagWriter->closeTag();        
    } 
    
    m_pTagWriter->closeTag();
}

Take a look at the m_pTagWriter->openTag calls in these methods. Instead of "div" in IE_Exp_HTML_DocumentWriter they are using new HTML5 tags "section", "aside", etc.

Telling IE_Exp_HTML about new document writer

To tell IE_Exp_HTML it must use our new document writer we need to create utility class - WriterFactory and set writer factory of IE_Exp_HTML instance. Our custom writer factory is very simple - it just creates new instance of IE_Exp_EPUB3Writer:

return new IE_Exp_EPUB_EPUB3Writer(pOutputWriter);

And now time to pass it to HTML exporter plugin:

UT_Error IE_Exp_EPUB::EPUB3_writeStructure()
{
    m_oebpsDir = m_baseTempDir + G_DIR_SEPARATOR_S;
    m_oebpsDir += "OEBPS";
    UT_go_directory_create(m_oebpsDir.utf8_str(), 0644, NULL);

    UT_UTF8String indexPath = m_oebpsDir + G_DIR_SEPARATOR_S;
    indexPath += "index.xhtml";

    // Exporting document to XHTML using HTML export plugin 
    char *szIndexPath = (char*) g_malloc(strlen(indexPath.utf8_str()) + 1);
    strcpy(szIndexPath, indexPath.utf8_str());
    IE_Exp_HTML_WriterFactory *pWriterFactory = new IE_Exp_EPUB_EPUB3WriterFactory();
    m_pie = new IE_Exp_HTML(getDoc());
    m_pie->setWriterFactory(pWriterFactory);
    m_pie->suppressDialog(true);
    m_pie->setProps(
        "embed-css:no;html4:no;use-awml:no;declare-xml:yes;add-identifiers:yes;");
    
    m_pie->set_SplitDocument(m_exp_opt.bSplitDocument);
    m_pie->set_MathMLRenderPNG(m_exp_opt.bRenderMathMLToPNG);
    m_pie->writeFile(szIndexPath);
    g_free(szIndexPath);
    DELETEP(pWriterFactory);
    return UT_OK;
}
Personal tools