How to Host Your Book as a Static Website on Amazon S3 Using Bookdown
Introduction
Have you ever wanted to transform your Word document into a beautifully formatted book and share it with the world as a website? Well… I wanted to publish my professional thesis and make it publicly accessible as a website and not just a PDF document.
With Bookdown and Amazon S3, you can do just that! This guide will walk you through the entire process, from converting your Word document to an RMarkdown file, creating a Bookdown project, and finally deploying it on an Amazon S3 bucket. I will take my thesis as an example.
Prerequisites
Before we get started, make sure you have the following tools installed and configured:
R and RStudio: For creating and managing your Bookdown project.
- install it from this documentation page
AWS CLI: To interact with Amazon S3.
- Installation guide here
Terraform: For setting up and managing your AWS infrastructure.
Pandoc: To convert your Word document to an RMarkdown file.
- Install it from this guide
What is Bookdown ?
Bookdown is an R package that facilitates the creation of books and long-form documents with R Markdown. It allows you to write content in Markdown and then compile it into various formats such as HTML, PDF, and ePub. Bookdown is particularly useful for creating technical documentation, reports, and academic publications.
It provides features such as multi-page HTML output, numbering and cross-referencing of figures, tables, sections, and equations, inserting parts and appendices, and incorporates the GitBook style to create elegant and appealing HTML book pages. I already used it to publish this academic paper.
As Bookdown generate html files we can just host them on any static website hosting, but let’s do it with amazon just for the sake of the knowledge. Let's dive in!
How to publish a Word document as a website using Bookdown
Step 1: Convert Word Document to RMarkdown
The first step in our journey is to convert your Word document into an RMarkdown file. This will serve as the foundation for your Bookdown project.
Install Pandoc: Pandoc is a powerful tool for converting documents from one format to another. It is often bundled with RStudio, but you can install it separately if needed. Visit the Pandoc installation page for detailed instructions.
Convert the Document: Open a terminal and run the following command to convert your Word document (
document.docx
) to an RMarkdown file (document.Rmd
):pandoc document.docx -o document.Rmd
In my case I would like to separate the document in multiple markdown files by chapter. So I first converted it to one markdown file
pandoc -f docx -t markdown --atx-headers --extract-media="." -o thesis.md /path/to/thesis.docx
Then I split this file to by chapter using markdown level one heading with awk
.
awk -F, '/^# /{h=substr($0,3);} {print > ( h ".Rmd")}' thesis.md
That’s why --atx-header
option is needed with pandoc <2.11.2
.
Step 2: Create a Bookdown Project
Now that you have your RMarkdown files, it's time to create a Bookdown project. This will allow you to compile your content into a beautifully formatted website book.
You can use my template to create easily and version your book on GitHub or follow the next step.
Install Bookdown: Open RStudio and install the Bookdown package by running the following command:
install.packages("bookdown")
Create a New Bookdown Project: In RStudio, create a new Bookdown project:
Go to
File
->New Project
->New Directory
->Book Project using bookdown
.Follow the prompts to set up your project.
Add Your RMarkdown File: Replace the default
index.Rmd
or add yourdocument.Rmd
to the project directory.Build the Book: In RStudio, use the Build tab to render your book or execute the next command :
bookdown::render_book("index.Rmd", "bookdown::gitbook")
By default, the book is generated to the _book
directory. If you use my template it will be in the docs
directory.
Step 3: Deploy to Amazon S3
With your book ready, the final step is to deploy it as a static website on Amazon S3. This will make your book accessible to anyone with an internet connection and a browser. I prefer to use terraform
to manage my cloud resources, so let’s do it using the following steps.
Connect to your
aws
account withaws configure
Think about using an IAM account for security reasons.
Set Up Terraform Configuration: Create a
main.tf
file to configure the AWS provider, update the region according to your need :terraform { required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } provider "aws" { region = "eu-west-3" }
Create S3 Bucket and Resources: Create a
resource.tf
file to define the S3 bucket and its resources:resource "aws_s3_bucket" "thesis" { bucket = "web-bucket-thesis" } resource "aws_s3_object" "thesis_files" { for_each = fileset(var.thesis_folder, "**/*") bucket = aws_s3_bucket.thesis.id key = each.value source = "${var.thesis_folder}/${each.value}" acl = "public-read" } resource "aws_s3_bucket_website_configuration" "thesis" { bucket = aws_s3_bucket.thesis.id index_document { suffix = "index.html" } error_document { key = "404.html" } } resource "aws_s3_bucket_policy" "public_read_access" { bucket = aws_s3_bucket.thesis.id policy = <<EOF { "Version": "2024-09-14", "Statement": [ { "Effect": "Allow", "Principal": "*", "Action": [ "s3:GetObject" ], "Resource": [ "${aws_s3_bucket.thesis.arn}", "${aws_s3_bucket.thesis.arn}/*" ] } ] } EOF }
Update variables
Create
variables.tf
to define a variable for your book directoryvariable "thesis_folder" { description = "Path to the document html files" }
Set this varible by creating a file named
terraform.tfvars
and assigning a real value to itthesis_folder = "/path/to/your/book"
Deploy with Terraform: Initialize, plan and apply the Terraform configuration:
terraform init terraform plan terraform apply
Get the output from terraform
To get the link to your book you can add a terraform output to a file you could named
outputs.tf
output "thesis_website_endpoint" { value = aws_s3_bucket.thesis.website_endpoint }
(Bonus) Upload Bookdown Output to S3: Use the AWS CLI to sync your Bookdown output directory (
_book
) to the S3 bucket:# at the root of you project aws s3 sync _book s3://web-bucket-thesis --acl public-read
Update the bucket and path values with yours
Be aware that
Amazon S3 website endpoints do not support HTTPS or access points. If you want to use HTTPS, you can use Amazon CloudFront to serve a static website hosted on Amazon S3
If you need it, you can ask me to write a guide on that part.
Conclusion
Congratulations! You have successfully converted a Word document to an RMarkdown file, created a static website using Bookdown, and deployed it to an Amazon S3 bucket. Your book is now live and accessible to the world. Enjoy sharing your knowledge and expertise with a global audience!
Plot twist
Finally, I decided to host my thesis with GitHub Pages because of the benefits it offers like:
Free Hosting: GitHub Pages provides free hosting for static websites, which is ideal for Bookdown projects.
Easy Deployment: You can deploy your site directly from your GitHub repository on a specific branch or a specific subdirectory, making it easy to update and manage.
Version Control: Since your project is hosted on GitHub, you can take advantage of Git's version control features to track changes and collaborate with others.
Custom Domains: GitHub Pages allows you to use custom domains, giving your site a professional appearance.
You can find it out by checking this repository.