How to Host Your Book as a Static Website on Amazon S3 Using Bookdown

Introduction

Have you ever wanted to transform your Word document into a beautifully formatted book and share it with the world as a website? Well… I wanted to publish my professional thesis and make it publicly accessible as a website and not just a PDF document.

With Bookdown and Amazon S3, you can do just that! This guide will walk you through the entire process, from converting your Word document to an RMarkdown file, creating a Bookdown project, and finally deploying it on an Amazon S3 bucket. I will take my thesis as an example.

Prerequisites

Before we get started, make sure you have the following tools installed and configured:

What is Bookdown ?

Bookdown is an R package that facilitates the creation of books and long-form documents with R Markdown. It allows you to write content in Markdown and then compile it into various formats such as HTML, PDF, and ePub. Bookdown is particularly useful for creating technical documentation, reports, and academic publications.

It provides features such as multi-page HTML output, numbering and cross-referencing of figures, tables, sections, and equations, inserting parts and appendices, and incorporates the GitBook style to create elegant and appealing HTML book pages. I already used it to publish this academic paper.

As Bookdown generate html files we can just host them on any static website hosting, but let’s do it with amazon just for the sake of the knowledge. Let's dive in!

How to publish a Word document as a website using Bookdown

Step 1: Convert Word Document to RMarkdown

The first step in our journey is to convert your Word document into an RMarkdown file. This will serve as the foundation for your Bookdown project.

  1. Install Pandoc: Pandoc is a powerful tool for converting documents from one format to another. It is often bundled with RStudio, but you can install it separately if needed. Visit the Pandoc installation page for detailed instructions.

  2. Convert the Document: Open a terminal and run the following command to convert your Word document (document.docx) to an RMarkdown file (document.Rmd):

     pandoc document.docx -o document.Rmd
    

In my case I would like to separate the document in multiple markdown files by chapter. So I first converted it to one markdown file

pandoc -f docx -t markdown --atx-headers --extract-media="." -o thesis.md /path/to/thesis.docx

Then I split this file to by chapter using markdown level one heading with awk.

awk -F, '/^# /{h=substr($0,3);} {print > ( h ".Rmd")}' thesis.md

That’s why --atx-header option is needed with pandoc <2.11.2.

Step 2: Create a Bookdown Project

Now that you have your RMarkdown files, it's time to create a Bookdown project. This will allow you to compile your content into a beautifully formatted website book.

You can use my template to create easily and version your book on GitHub or follow the next step.

  1. Install Bookdown: Open RStudio and install the Bookdown package by running the following command:

     install.packages("bookdown")
    
  2. Create a New Bookdown Project: In RStudio, create a new Bookdown project:

    • Go to File -> New Project -> New Directory -> Book Project using bookdown.

    • Follow the prompts to set up your project.

  3. Add Your RMarkdown File: Replace the default index.Rmd or add your document.Rmd to the project directory.

  4. Build the Book: In RStudio, use the Build tab to render your book or execute the next command :

     bookdown::render_book("index.Rmd", "bookdown::gitbook")
    

By default, the book is generated to the _book directory. If you use my template it will be in the docs directory.

Step 3: Deploy to Amazon S3

With your book ready, the final step is to deploy it as a static website on Amazon S3. This will make your book accessible to anyone with an internet connection and a browser. I prefer to use terraform to manage my cloud resources, so let’s do it using the following steps.

  1. Connect to your aws account with

     aws configure
    

    Think about using an IAM account for security reasons.

  2. Set Up Terraform Configuration: Create a main.tf file to configure the AWS provider, update the region according to your need :

     terraform {
       required_providers {
         aws = {
           source  = "hashicorp/aws"
           version = "~> 5.0"
         }
       }
     }
    
     provider "aws" {
       region = "eu-west-3"
     }
    
  3. Create S3 Bucket and Resources: Create a resource.tf file to define the S3 bucket and its resources:

     resource "aws_s3_bucket" "thesis" {
         bucket = "web-bucket-thesis"
     }
    
     resource "aws_s3_object" "thesis_files" {
       for_each = fileset(var.thesis_folder, "**/*")
    
       bucket = aws_s3_bucket.thesis.id
       key    = each.value
       source = "${var.thesis_folder}/${each.value}"
       acl    = "public-read"
     }
    
     resource "aws_s3_bucket_website_configuration" "thesis" {
       bucket = aws_s3_bucket.thesis.id
    
       index_document {
         suffix = "index.html"
       }
    
       error_document {
         key = "404.html"
       }
     }
    
     resource "aws_s3_bucket_policy" "public_read_access" {
       bucket = aws_s3_bucket.thesis.id
       policy = <<EOF
     {
       "Version": "2024-09-14",
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": "*",
           "Action": [ "s3:GetObject" ],
           "Resource": [
             "${aws_s3_bucket.thesis.arn}",
             "${aws_s3_bucket.thesis.arn}/*"
           ]
         }
       ]
     }
     EOF
     }
    
  4. Update variables

    Create variables.tf to define a variable for your book directory

     variable "thesis_folder" {
         description = "Path to the document html files"
     }
    

    Set this varible by creating a file named terraform.tfvars and assigning a real value to it

     thesis_folder = "/path/to/your/book"
    
  5. Deploy with Terraform: Initialize, plan and apply the Terraform configuration:

     terraform init
     terraform plan
     terraform apply
    
  6. Get the output from terraform

    To get the link to your book you can add a terraform output to a file you could named outputs.tf

     output "thesis_website_endpoint" {
         value = aws_s3_bucket.thesis.website_endpoint
    
     }
    
  7. (Bonus) Upload Bookdown Output to S3: Use the AWS CLI to sync your Bookdown output directory (_book) to the S3 bucket:

     # at the root of you project
     aws s3 sync _book s3://web-bucket-thesis --acl public-read
    

    Update the bucket and path values with yours

Be aware that

Amazon S3 website endpoints do not support HTTPS or access points. If you want to use HTTPS, you can use Amazon CloudFront to serve a static website hosted on Amazon S3

If you need it, you can ask me to write a guide on that part.

Conclusion

Congratulations! You have successfully converted a Word document to an RMarkdown file, created a static website using Bookdown, and deployed it to an Amazon S3 bucket. Your book is now live and accessible to the world. Enjoy sharing your knowledge and expertise with a global audience!

Plot twist

Finally, I decided to host my thesis with GitHub Pages because of the benefits it offers like:

  • Free Hosting: GitHub Pages provides free hosting for static websites, which is ideal for Bookdown projects.

  • Easy Deployment: You can deploy your site directly from your GitHub repository on a specific branch or a specific subdirectory, making it easy to update and manage.

  • Version Control: Since your project is hosted on GitHub, you can take advantage of Git's version control features to track changes and collaborate with others.

  • Custom Domains: GitHub Pages allows you to use custom domains, giving your site a professional appearance.

You can find it out by checking this repository.


Read the thesis