Posts Tagged aws

A First Look at Google Cloud Datastore

Note: This is a continuation of my post on exploring Google Cloud Platform. There is also a mirror of it on

At first glance, Datastore is equivalent to DynamoDB. I personally think that Datastore is better compared to SimpleDB. Unfortunately, SimpleDB isn’t accepting new customers/isn’t being deployed to new regions, so it’s not a good comparison.

That said, the fundamental idea behind Datastore is the same (Hosted NoSQL database), but how it’s implemented is very different.

App Engine Datastore vs Cloud Datastore?

At one point, Cloud Datastore was (is?) part of App Engine, but it’s since been split out. Presumably as part of this legacy, Datastore appears limited to the regions that App Engine is in, which unfortunately isn’t all of Google’s regions.

Additionally, an App Engine account is created for Cloud Datastore. It’s required if you use the Datastore SDKs as well. Why this dependency exists, and why it’s exposed are open questions.

Even more confusing is that the docs for App Engine Datastore list DB Datastore as superseded, but then link to docs about Cloud Datastore. App Engine also mentions a NDB Client Library, which as far as I can tell wraps the actual Cloud Datastore service, but is specific to App Engine. There is also at least one more article that treats Cloud Datastore and the DB/NDB libraries as separate things.

The only thing I can suggest is check the URL, make sure the docs you’re reading start with

1. SQL-like semantics (transactions!)
2. More granular breakdowns for multi-tenancy: namespaces/’Kind’/’ancestor path’ (Google says a Kind is functionally equivalent to a table). I’m not sure about the usefulness of the namespace/kind distinction, but it’s an extra way to get multi-tenancy and is ignored by default, so meh.
3. Per request pricing! DynamoDB is charged at what you’re expected to use, not what you actually use. Given AWS’s obsessive focus on “pay what you use”, Dynamo’s provisioned read/write units are odd
4. Automatic indexes for every property enables arbitrary querying, not AWS’s you must define any indexes you want
5. A dashboard that allows SQL-like queries to be run (but only SELECT queries)

1. Nothing like DynamoDB streams (which are awesome for replication/async actions that are implicitly triggered off a data change)
2. Dynamo has 25x the storage on the free tier compared to Datastore (25GB vs 1GB)
3. Dynamo offers more total read/write operations per day – good if you have a consistent request rate, bad if you have bursts
4. Index (created by default, you have to opt out) data storage seems to be charged for
5. Creating a custom index requires the use of the gcloud CLI tool. There is no mention of any other method in the index documentation.
6. If you have a query that involves filtering on more than one property, you might run into a situation that isn’t covered by the built-in indexes or is otherwise impacted by one of a decently long list of query restrictions.

While you could get away with doing a scan + filter combination in Dynamo, GQL will reject you with a "Your Datastore does not have the composite index (developer-supplied) required for this query." error. (My usecase was select * from kind where property1 < value order by property2.)

I haven’t found a way to get Datastore to scan and filter server side, so I have to iterate over everything and throw away data that I don’t want – after retrieving it.


A bit more about the price, because the pricing models of the two products are really different.

Dynamo’s pricing model makes sense if you’re doing a fairly consistent number of requests per second. Dynamo attempts to support bursting, but they do it by having a 300 sec bucket of * provisioned but unused read/write capacity, and bursting out of that. When you exhaust the bucket, requests are denied.

So if you’re trying to save money, and drop your read/write units to 1, and you do something request heavy, you’re going to have a bad time unless you increase the units before running your operation. Dynamo’s new auto scaling feature takes some time to kick in as well (the scale up alarms take 5 minutes to kick in – the CloudWatch alarm is set on ConsumedWriteCapacityUnits > NN for 5 minutes).

In contrast, Datastore’s charge-per-request model fits dramatically varying traffic patterns better, mainly because you’re not paying for capacity that sits unused.

If you’re doing any sort of table scanning in Dynamo to find elements by properties, or you have indexes on single properties, chances are Datastore will work better for you by virtue of the built-in-by-default indexes. You can get the same functionality out of Dynamo, but it’s harder to set up, and functions as (and is charged as) a separate table.

If you have composite (multi-property) indexes, that’s a bit more complicated. Datastore does a far better job of hiding the index complexity (once it’s set up) and actually using the indexes. But the setup process is hit or miss, requiring you to know in advance things like sort orders.

If you’re not doing anything fancy, and just accessing everything directly by key, Dynamo is better for small scale stuff by virtue of the massively greater free storage space (25GB vs 1GB).

, , ,

No Comments

A First Look at Google Cloud Storage

Note: This is a continuation of my post on exploring Google Cloud Platform. There is also a mirror of it on

Storage price wise, S3 and GCS are mostly comparable, with a note that GCS bandwidth is more expensive, and you don’t really get a choice of what rate you want to pay.

GCS Single Region is pretty much directly equivalent to standard S3, and Nearline is equivalent to S3 Infrequent Access, complete with per GB retrieval fees. Storage for both classes is cheaper than the respective S3 classes. Operation fees are exactly the same though.

There’s two other storage classes though:

Multi-region Buckets

One major feature is multi-region buckets for $0.006/GB more. Presumably built to avoid incidents like us-~~tirefire~~-east-1 falling over and your buckets disappearing, it’s a definite point if you need high availability (and are willing to trust that Google has proper HA).

Assuming single region stores 3 copies, I speculate that for the price, multi-region stores 2 copies in each region for a total of 4 copies. That (suspected) single additional copy would be why multi-region buckets are only a third more expensive.


Coldline is a bit more interesting. As the equivalent of Glacier, it has one big advantage – quick retrieval times (on the order of a second), much better than Glacier. The downside is the fixed retrieval costs – $0.05/GB. Just comparing Coldline to Nearline means you shouldn’t retrieve anything more than once every 2 years(!) if you want to save money compared to Nearline. (Nearline is $0.01/GB, Coldline is $0.007/GB, so you save $0.003/GB/month going for Coldline.)

In comparison, Glacier has a number of retrieval speeds and corresponding retrieval pricing. Even Glacier’s expedited requests (the most expensive option) cost 40% less than Coldline. (Interestingly, the bulk retrieval option works out to about 25% of retrieving from S3 IA, but takes in the order of 8-12 hours.)

Considering Glacier costs almost 50% less than Coldline, I really question the utility of Coldline. With the restore costs it only makes sense if you’re storing archival data that doesn’t need to be accessed for around 2 years. I think Glacier has a much better handle on the expected use cases here.

I can see Nearline being used (and people having the expectation of immediate access), but the retrieval price of Coldline with no way to change that makes me very leery of using it.

, , ,

No Comments

A First Look at Google Compute Engine

Note: This is a continuation of my post on exploring Google Cloud Platform. There is also a mirror of it on

Compute Engine is one of the more important services for me. While I’d love to have all my stuff on managed services like Lambda/Cloud Functions, it’s not possible. The fundamentals of GCE are pretty similar to EC2, but there’s a few interesting features that would be interesting to see EC2 adopt (if ever).

1. The GCE console supports SSHing to an instance within your browser. There’s also automatic auto SSH key adding, something which had annoyed people (me) using EC2. You can still have a master SSH keypair, but you can also add keypairs for individual users.
2. Linux OSes (Debian, CentOS, etc) are natively supported, not “support for Amazon Linux, and maybe other distros”. This is pretty much required for some of the stuff that GCE is offering. The auto-key pair adding? Done by a daemon waiting for instructions. Said daemon has been added to the OS for you.
3. Custom machines types (mix and match memory and CPU cores). I’d love to see EC2 adopt this, but it’s not going to happen any time soon.
4. Sustained usage discounts: No reserved instances required for discounts
5. Premptible instances have a 24 hour limit, unlike EC2’s spot block of max 6 hours. Having a fixed price means you don’t have to worry about bill shock. Tradeoff is that there’s no spot instance equivalent, where if your bid is high enough, the instance will practically never be terminated.
6. Live migration for maintenance events! It’s coming to EC2 (it’s in the Xen mainline), just a question of when.

1. No security groups equivalent
2. Firewall rules applied to the entire VPC, or specific labelled instances (I suppose you could bludgeon this into security groups).
3. No comparison graphing. Are they depending on external providers (ie Datadog) like they do for sending email? (Yes, the creators of GMail recommend using another party to send email…)
4. The bandwidth out prices. Google might have a super awesome network, with private backhaul to endpoints that are close… but that is more expensive than AWS.

Also, it’s subjective, but I find that the GCE console UI is a lot cleaner than EC2’s.

, ,

No Comments

A First Look at Google Cloud Platform

This got really long, so I broke it up into parts. Feel free to skip to the parts you’re interested in.
* Setting up, Organization differences & IAM/Authentication (this post) ( mirror)
* Google Compute Engine ( mirror)
* Google Cloud Storage ( mirror)
* Google Cloud Datastore ( mirror)

I have a bunch of experience with AWS (Disclaimer: I worked there). My AWS account dates back to 2010, but I’ve only really started using AWS heavily in the last 2 years.

I’ve been speccing out and performing cost estimations for a new project recently, and with the introduction of the GCP free tier and the $300 of credit, I decided to look into some of the services GCP offers to see how it compares to AWS.

Google has a comparison between AWS and GCP, which is useful but pretty dry. I decided to just dive in and experiment – that $300 of credit means I’m pretty safe!

Registering for GCP

It was a matter of going to the GCP console and logging in with my Google account. I had to sign up for the free trial and provide my credit card details, but that was it. Compared to the AWS signup process, this was a lot simpler.

However, it’s simple because Google has effectively split the account verification steps – I used my gmail account, which was already verified. A side effect of this is created resources are associated with this account by default. An AWS account is trivial to transfer – update the email address and be done with it. My Google account? Less easily transferred, but that brings me to the first major difference.

AWS Accounts vs GCP Projects

Google doesn’t really expound on the account/project difference, merely saying this in their comparison:

Cloud Platform groups your service usage by project rather than by account. In this model, you can create multiple, wholly separate projects under the same account. In an organizational setting, this model can be advantageous, allowing you to create project spaces for separate divisions or groups within your company.

In practice, it’s an entirely different way of handling resources. If you wanted to run something in its own isolated silo in AWS, you would generally create an entirely separate account, and use consolidated billing/AWS Organizations (which is a whole other set of problems). In GCP, each project is its own little silo, with no communication between projects by default.

After getting used to AWS (and using AWS Organizations to handle the account-per-project), this is a very different way of thinking. To me, there are two main benefits.

The first is that switching ownership of resources is incredibly simple – assign a new project owner, and remove the existing owner, and it’s done. What’s most impressive is that (as far as I can tell), the transfer will be done without interrupting anything currently running. Compute Engine instances will continue to run, Cloud Storage buckets don’t need to have contents copied out, the bucket deleted, then recreated in the new account, hoping that no one else steals the bucket name in the meantime.

The second benefit is that segmentation of projects is far easier. You don’t have to have the equivalent of an AWS account per project if you want separation for security.

A nerfed IAM?

The downside of separation by project is that GCP seems not to have an active equivalent of AWS IAM’s ability to restrict access to individual resources. The GCP documentation explicitly calls this out:

Permissions are project-wide, which means that developers in a project can modify and deploy the application and the other Cloud Platform services and resources associated with that project.

I am conflicted over this situation. Best practice says that accounts/roles should have the fewest permissions possible. I try to lock down my IAM policies to specific resources wherever possible. For example, a user can only interact with a single SQS queue because I restrict the attached IAM policy by queue name.

On GCP, it’s all or nothing within a project. I have to allow access to all PubSub topics if I want to allow access to one.

Now, what actually happens is that people can and do liberally use * in their IAM policies in AWS, so Google’s just making it really easy to get up and running.

But the fact that the restrictions aren’t available by default are worrying, especially for large companies that do have to capability to manage IAM policies (and not operate accounts per service).

I think Google’s realised this, and is extending IAM (still in Alpha) to allow permissions to be defined on individual resources where supported (eg PubSub, Datastore). It looks like it’s possible to use the IAM API to define custom roles, but I haven’t successfully done so. I just ended up using project isolation, which works, but feels bad.


GCP has a greater variety of ways to authenticate with their APIs compared to AWS.

  • Compute Engine/App Engine work with IAM and get credentials, much like EC2’s instance roles. These are limited to individual projects.
  • Developers using the gcloud CLI can authenticate using OAuth2, and switch between projects.
  • Non-interactive systems outside GCP use a service account that’s tied to a specific project.

Using the SDK requires creating a service account, which generates a JSON file (or PKCS12, but let’s ignore that). The easiest way is to use an environment variable GOOGLE_APPLICATION_CREDENTIALS to set the location of the file when using the SDK, and let the SDK handle everything.

You can define the file location in code, like Boto. (And presumably other AWS SDKs, I’ve only really used the Python version.)

You can also go full OAuth, provide a list of scopes and walk through the OAuth process, but … no.

Compared to AWS’s simple “IAM users get a Access Key & Secret Key”, it seems rather overcomplicated. Thankfully reasonable defaults are used.

Amusingly, GCP isn’t as agnostic as AWS is – references to G Suite sometimes appear. Not everyone is designing applications to be run on G Suite, so I have a feeling it’s just old documentation.

, ,

No Comments

Exploring AWS Organizations

When AWS Organizations went GA, I was really happy. While I’ve had my personal AWS account for a while, I have a bunch of sites that aren’t personal in nature, and I wanted to spin them off into another account with the intention of having enough isolation that I could apply Terraform to those accounts freely.

The Good Parts

Inviting an existing account was painless

I performed this with the other account open in an incognito session. It was a simple matter to send the invitation from one account, and the offer showed up in under a minute in the Organizations tab in the other account.

The entire process took under 5 minutes.

Cross Account Access was set up automatically

I’ve spent a bunch of time setting up role switching. I really appreciate the automatic setup. (Though see Nit #2.)

Creating accounts is super simple

It’s 3 fields on the UI, only 2 are compulsory. The vast majority of information on the billing panel (Address, etc) is automatically copied from the root account.

SCPs are powerful

I blocked Route 53 as an experiment. After I applied the SCP, I can’t do anything in the R53 console:

Update: I had a nit about SCP policy inheritance being confusing – it was actually a misunderstanding about how policies are applied. I applied a restrictive policy to the root OU, expecting that it would result in default deny for accounts, but attaching a separate policy would override the restrictive policy. Unfortunately, the restrictive policy blocked me from being able to do anything, even on accounts with a FullAWSAcess policy attached.

This is a security feature – child nodes can’t be allowed to set a more permissive policy than what a (grand)parent specifies. So even if the child has the FullAWSAccess policy applied (like I had), it’s ineffective because my root said “Only have access to read-only billing details”, and the intersection of allowed permissions was “read-only access to billing data”.

As such, the root policy that should be something that either allows or denies something that you want applied to all accounts in your organization (except the master account). In practice I think people are going to end up leaving the FullAWSAccess policy on the Root OU, and apply more selective policies to nodes.

The Less Good Parts/Nits

The process wasn’t as pain free as hoped, here’s some rough edges I found, pretty much all issues I found on first use, and avoided once I knew about them:

1. The progress of account creation through the console isn’t clear.

Specifically, after clicking the “Create” button on the Add Account page, I get redirected back to the account list, and there’s a new row that just says “Creating…”.

There doesn’t appear to be an automatic refresh, so I manually refreshed. Now the “creating” message has disappeared, and a new account with an empty name and an account number appears in the account list.

Refresh a few more times, and the name eventually appears. I’m guessing the name field is only populated after the account is set up on the backend.

AWS: It’d be nice if the newly created account was somehow marked as “creation in progress”.

2. It is unclear how to get access to the newly created account.

AWS Orgs is probably aimed at a more experienced subset of AWS users, so I’d imagine they’d know how to do this. But it would be nice to explicitly mention that you have to role-switch to the named role that was created as part of the account. Something like “To access the new account, you need to switch role from your master account, using the account number and the name of the IAM role you provided when the account was created.”

You (also) don’t appear to be able to role switch to it as soon as the account number is listed in the account list. You have to wait until the account name shows up in the account list before you can role switch to the new account.

AWS: Also, the flow of having to go out of the Org page back to the AWS console to set up the switching seems easily optimized – a “Switch Role” button next to each account would be helpful.

3. What’s with all the emails?

I’m unsure if this will be fixed – right now it appears that Orgs is hooking onto the normal creation process. This means for every account you create you get “Welcome to Amazon Web Services” and “Your AWS Account is Ready – Get Started Now” emails sent to the new account’s email address.

4. The Auto-Created Role

First off – The role created by Organizations uses an inline AdministratorAccess policy, not the AWS managed one. Considering it has the same name (and actual Policy details) as the managed policy, I’m not sure why an inline policy was used. For what it’s worth, I tried swapping it out on my test account, and there were no adverse effects.

Secondly, what happens if I create an account, but don’t specify a role name? I’m guessing an account gets created without the cross account access setup, and I have to do the forgot-password dance on the email address I created the account with to get access. I’ve verified that you can do the forget password dance to set a password for the root account..

5. You can’t delete an account

I jumped in head first and created a (badly named) testing account before realizing this: Deleting accounts is (currently) not possible. The feature might be coming though! The “Remove Account” button is misleading – it will happily allow you to attempt to remove a created account, only for you to get an error.

The Organizations backend clearly knows which account is a created account (the status column shows “Created” vs “Joined”), so the “Remove Account” button could be disabled when a created account is selected. This is already done for the Master Account, so it might be simple to extend to created accounts. Or relabel the button “Remove Invited Account” to be super clear.

As a side note, I might be able to close a created account through the Billing panel (there’s a “Close Account” checkbox), but I’m not sure what would happen to my master account – and I’m kind of attached to it, so I’m not going to risk it.

Related, I can’t delete an account that failed to be created.

I managed to do this accidentally – see nit #1. I thought the initial creation had failed, and I tried to create the account again. The second request got accepted by Organizations, only to eventually error out because I reused the account email address.

6. A grab bag of other stuff

Three small things I encountered that didn’t really deserve their own section.

  • Orgs is keeping (caching?) their own copy of the Account Name. I renamed an account in the AWS Console, but the change didn’t replicate back to the Org account list. So there might be name drift if you rename accounts (there appears to be no way to rename an account through Organizations.)
  • It is unclear what happens if I update the details in the master account (eg the address). I’m guessing it’s not replicated to the other accounts.
  • The default VPC in my master account is The default VPC in a created account is – but for every created account. I’m not sure if this is accidental or intended, but the result is I can peer the default VPC in my root account with a default VPC in one of the created accounts. If it is intended, would it be possible to look at current VPCs and select a CIDR that isn’t in use?

7. SCPs are confusing

My original point here was the result of misunderstanding how the policies are applied. Only thing I’d suggest is add an “Effective Policy” listing under the SCP entry for each node.

SCP wasn’t enabled by default, despite my selecting Enable All Features in Settings. There’s a second step I missed: you have to enable SCPs on the root OU, not the master account – something which is documented, but I completely missed.

Tip: After navigating to the “Organize Accounts” tab, click “Home” to see the top level OU. The list of your acccounts and OUs is not the top level of your organization.

Also, having an SCP of FullAWSAccess appears to be required on the root OU. I removed it (after creating a stub policy, because at least one policy is required to be attached to the root), and promptly lost the ability to do anything in a cross-account role, even though those accounts still had the FullAWSAccess policy attached to each account individually:

To be clear, default allow is the sane default – the intention here is probably “You should be putting accounts in OUs and black/whitelisting on that”. But being unable to use role switching after the SCP on the root OU was weird.

I’m assuming there’s a combination of resources and permissions that I can add to an SCP that would allow the cross account stuff and nothing more. It’d be nice if AWS provided this as a reference.

That said

It could have gone far worse. The invite process was smooth for existing accounts, and Orgs is doing what is says it will, with some exceptions on the UI side of things.

I’m very sure they’ll iterate and improve it. I’m personally hoping there’s going to be a way to move resources between accounts. Being able to move S3 buckets to another account alone would be amazing.


1 Comment

Quick and Dirty Shoestring Startup Infra

At the University of Waterloo, we have a Final Year Design Project/Capstone project. My group is working on a conference management suite called Calligre. We’ve been approaching it as kind of a startup – we presented a pitch at a competition and won! While sorting out admin details with the judges after, they were oddly impressed that we had email forwarding for all the group members at our domain. Apparently it’s pretty unique.

In the interest of documenting everything I did both for myself, and other people to refer to, I decided to write down everything that I did.

Note that we’re students, so we get a bunch of discounts, most notably the Github Student Pack. If you’re a student, go get it.


  1. Purchase a domain. NameSilo is my go-to domain purchaser because they have free WHOIS protection, and some of the cheapest prices I’ve seen.
    Alternatives to NameSilo include NameCheap and Gandi. Of the two, I prefer Gandi, since they don’t have weird fees, but Namecheap periodically has really good promos on that drop a price significantly.
  2. Use a proper DNS server. Sign up for the CloudFlare free plan – not on your personal account, but a completely new one. CloudFlare doesn’t have account sharing for the free plan yet, so Kevin and I are just using LastPass to securely share the password to the account. For bonus points, hook CloudFlare up to Terraform and use Terraform to manage DNS settings.
    Alternatives include DNSimple (2 years free in the GitHub Student Pack!) and AWS Route 53.
  3. Sign up for Mailgun – they allow you to send 10000 messages/month for free. However, if you sign up with a partner (eg Google), they’ll bump that limit for you. We’re sitting at 30000 emails/month, though we needed to provide a credit card to verify that we were real people.
    Follow the setup instructions to verify your domain, but also follow the instructions to recieve inbound email. This allows you to setup routes.
    Alternatives include Mailjet (25k/month through Google), and SendGrid (15k emails/month through the Github Student Pack) – though SendGrid doesn’t appear to do email forwarding, they will happily take incoming emails and post them to a webhook
  4. Once you have domains verified and email setup, activate email forwarding. Mailgun calls this “Routes”. We created an address for each member of the team, as well as contact/admin aliases that forward to people as appropriate. I recommend keeping this list short – you’ll be managing it manually.


  1. We currently have a basic landing page. This is still in active development, so we use GitHub Pages in conjunction with a custom domain setup until it’s done. This will eventually be moved to S3/another static site host. For now though, it’s free.
  2. Sign up for a new AWS account.
  3. Register for AWS Educate using the link in the Github Student Pack. This gets you $50 worth of credit (base $35 + extra $15). Good news for uWaterloo people: I’ve asked to get uWaterloo added as a member institution to AWS Educate, so we should be getting an additional $65 of credit when that happens.
    Note that if you just need straight hosting, sign up for Digital Ocean as well – Student Pack has a code for $50/$100 of credit!


  1. In AWS, create an IAM User Account for each user. I also recommend using groups to assign permissions to users, so you don’t need to duplicate permissions for every single user. I just gave everyone in our team full admin access, we’ll see if that’s still a good idea 6 months down the road…
  2. Change the org alias so people don’t need to know the 12 digit account ID to login.
  3. Enable IAM access to the billing information, so you don’t have to flip between the root account & your personal IAM account whenever you want to check how much you’ve spent.
  4. Enable 2 factor auth on the root account at the very least. Let another person in the team add the account to their Google Authenticator/whatever, so you’re not screwed if you have your phone stolen/otherwise lose it.

More stuff as and when I think about it.


No Comments

Notes from various AWS Investigations

  • AWS CloudWatch Logs storage charge == S3 storage charge. Possibly less, since the logs are gziped level 6 first.
  • CW Logs makes more sense than using AWS Elasticsearch at small scale – prices start at 1.8c an hour + EBS charges vs 50c/GB of log ingestion + storage
  • For pure log storage & bulk retrival, S3 makes far more sense than either ElasticSearch or CloudWatch Logs. B2 is ~20% of S3 though, so they make even more sense.

  • DynamoDB streams are for watching what happens to a table, and they rotate every ~24 hours, so you’d get charged on a rolling basis, and can’t delete individual events. I’m assuming events don’t disappear once you’ve processed them.

  • Cert Manager is in more zones! But only makes a difference if you hang stuff in front of an ELB. Certs for CloudFront have to still go through us-east-1.
  • API Gateway has direct integration with DynamoDB, doing an end run around Lambda functions that just insert & retrieve records ( Amusingly, models continue to not be used. (I still don’t understand what Models are supposed to do/enforce)
  • DynamoDB cross-region replication is weird. You spin up an EC2 instance that handles it for you. I wonder if the DynamoDB team will work on managed replication…
  • DynamoDB is stupid cheap, and it makes sense for me to migrate the vast majority of my DB centric stuff to it.
  • CloudFront has a weird “$0.000 per request – HTTP or HTTPS under the global monthly free tier” for requests, and I’m not sure why. My account is long out of the free tier.

No Comments

Using Amazon S3 + CloudFront + Certificate Manager to get seamless static HTTPS support

TL;DR: This post documents the process I took to get S3 to return redirect requests over HTTP + HTTPS to a given domain.

I’m trying to trim down the number of domains and subdomains that I host on my server, since I’m trying a new policy of moving servers every few months in an attempt to make sure I automate everything I can.

One of the things that I’ve done was to start consolidating static files under a single subdomain, and use 301 redirects in nginx to point to the new location. Thanks to Ansible, rolling out the redirect config is a matter of adding a new domain + target pair to a .yml file and running it against a server.

But it’d still be nice if I didn’t have as many moving parts – which meant that I looked at ways to get an external provider to host this for me. I decided to try getting S3’s static website hosting a try to see if it supported everything I wanted it to do. In this case, I want to return either a redirect to a fixed URL, or a redirect to a different domain, but same filename. Essentially, my nginx redirect configs are either return 301$request_uri or return 301

For the purposes of this post, let’s assume I have the domain that redirects to my Twitter profile.

Creating the S3 Bucket

I knew that S3 could do static site hosting – but the docs seem to indicate that while redirecting to a different domain is possible, it will still use the same path to the file. So this would work for the different domain, same file name case, but not the fixed URL case.

I found my solution in an example of redirecting on an HTTP 404 error – but it could be adjusted to redirect all the time by removing the Condition elements.

So let’s create the bucket with the AWS CLI: aws s3api create-bucket --bucket tw-kyle-io --create-bucket-configuration LocationConstraint=us-west-2

Then I had to apply the redirection rules. The CLI uses JSON format to specify the redirect rules, so to make things simple, I dumped the config into a file:

  "IndexDocument": {
    "Suffix": "redirect.html"
  "RoutingRules": [
      "Redirect": {
        "HostName": "",
        "Protocol": "https",
        "ReplaceKeyWith": "lightweavr"

and then called it through the CLI: aws s3api put-bucket-website --bucket tw-kyle-io --website-configuration file://website.json. Note that the use of IndexDocument is misleading: I never actually created a file in S3, but the S3 API requires that something be specified for that key.

People who have created a static website in S3 might be saying that I used a bad bucket name, because now I can’t use CNAMEs with the bucket. Well, I have the useful experience of writing this after discovering that HTTPS requests to the bucket don’t work because S3 doesn’t support HTTPS to the website endpoint. CloudFront doesn’t have restrictions on bucket naming, especially where we’re going to use the website endpoint so we can use redirections. (Hat tip to a StackOverflow question and another one as well.)

Amazon Certificate Manager

So I ended up using CloudFront. But first, I had to create a certificate with ACM. Because I’m creating a subdomain cert, I had to use the CLI – the console only allows you to create a * or cert.

aws acm request-certificate --domain-name --domain-validation-options,

I had to approve the certificate creation, which required waiting for the email. If you try to create you get an error about the cert not existing: A client error (InvalidViewerCertificate) occurred when calling the CreateDistribution operation: The specified SSL certificate doesn't exist, isn't valid, or doesn't include a valid certificate chain.

CloudFront Magic

Then, it was time to configure the CloudFront distribution. Which isn’t trivial, there’s a bunch of options, and it wouldn’t surprise me if I screwed something up. To get the options below, I created a distribution through the AWS console, then compared that against a generated template (aws cloudfront create-distribution --generate-cli-skeleton).

I put the following in a file named cf.json

    "DistributionConfig": {
        "CallerReference": "tw-kyle-io-20160402",
        "Aliases": {
            "Quantity": 1,
            "Items": [
        "DefaultRootObject": "",
        "Origins": {
            "Quantity": 1,
            "Items": [
                    "DomainName": "",
                    "Id": "",
                    "CustomOriginConfig": {
                        "HTTPPort": 80,
                        "HTTPSPort": 443,
                        "OriginProtocolPolicy": "http-only",
                        "OriginSslProtocols": {
                            "Quantity": 1,
                            "Items": [
        "DefaultCacheBehavior": {
            "TargetOriginId": "",
            "ForwardedValues": {
                "QueryString": false,
                "Cookies": {
                    "Forward": "none"
                "Headers": {
                    "Quantity": 0
            "TrustedSigners": {
                "Enabled": false,
                "Quantity": 0
            "ViewerProtocolPolicy": "allow-all",
            "MinTTL": 86400,
            "AllowedMethods": {
                "Quantity": 2,
                "Items": [
                "CachedMethods": {
                    "Quantity": 2,
                    "Items": [
            "SmoothStreaming": false,
            "DefaultTTL": 86400,
            "MaxTTL": 31536000,
            "Compress": true
        "Comment": "Redirect for",
        "Logging": {
            "Enabled": false,
            "IncludeCookies": false,
            "Bucket": "",
            "Prefix": ""
        "PriceClass": "PriceClass_100",
        "Enabled": true,
        "ViewerCertificate": {
            "ACMCertificateArn": "arn:aws:acm:us-east-1:111122223333:certificate/3f1f4661-f01b-4eef-ae0d-123412341234",
            "SSLSupportMethod": "sni-only",
            "MinimumProtocolVersion": "TLSv1",
            "Certificate": "arn:aws:acm:us-east-1:111122223333:certificate/3f1f4661-f01b-4eef-ae0d-123412341234",
            "CertificateSource": "acm"

and then ran the command aws cloudfront create-distribution --cli-input-json file://cf.json.

Testing it

The first thing to test was simply curlling the newly created distribution – after waiting ages for it to actually be created. (I assume CloudFront is just continually working through a queue of distribution changes to each of the 40+ POPs, so it’s actually quite awesome.)

[[email protected] ~]$ curl -v
* Rebuilt URL to:
< HTTP/1.1 301 Moved Permanently
< Location:
< Server: AmazonS3
< X-Cache: Miss from cloudfront
* Connection #0 to host left intact
[[email protected] ~]$ curl -v
* Rebuilt URL to:
< HTTP/1.1 301 Moved Permanently
< Location:
< Server: AmazonS3
< Age: 11
< X-Cache: Hit from cloudfront

Would you look at that, both the HTTP & HTTPS connections worked!

So I changed the CNAME in CloudFlare to point to the new distribution, and tried again, this time with

[[email protected] ~]$ curl -v
* Rebuilt URL to:
< HTTP/1.1 301 Moved Permanently
< Location:
< Age: 577
< X-Cache: Hit from cloudfront
< Via: 1.1 (CloudFront)
< X-Amz-Cf-Id: kl8eZMDzH2BB7T3owENtjFkS2xtfcwqoOsZ4-SNxLY8LMdupbXrp9Q==
< X-Content-Type-Options: nosniff
< Server: cloudflare-nginx
< CF-RAY: 28d9413b3943302a-YYZ

Because I use CloudFlare’s caching layer, parts of the response are overwritten by CloudFlare (notably the Server section, among others). But we can still see that the request ultimately hit the CloudFront frontend. The request was still redirected, so everything looks good.

In Closing

The main thing I didn’t like about this setup? The obtuse documentation. I had far better grasp of everything working through the console, then applying that to the CLI. But there’s still tiny things.

  1. I hit the ancient ‘Conflicting Conditional Operation’ issue about S3 buckets being quick to be deleted from the console, but not actually deleted in the backend. So when I mistakenly thought that --region us-west-2 would be enough to get the S3 bucket to be created in us-west-2, it took ~1 hour to become useable again.

  2. Passing --region us-west-2 to aws s3api create-bucket will create a S3 bucket in the default region, us-east-1. You have to specifically pass --create-bucket-configuration LocationConstraint=us-west-2 to get it created in a specific region.

  3. There’s aws s3 and aws s3api. Why aren’t the s3api operations merged into s3? No other service has an api suffix.

  4. Having to trial and error to find out what parameters are required and what aren’t for different operations. The CloudFront create-distribution command was the worst offender of the commands I ran, just with the sheer number of parameters. I’m hoping the documentation improves before it comes out of beta.

  5. Weird UI bugs/features. Main one I noticed was the ACM certificate selector not being selectable unless a cert exists… and the refresh button is included in that. So the very first time I created a CloudFront distribution, I needed to refresh the page to be able to select the cert, losing my settings.

Now for the important question: Now that I’ve set it up, will I keep on using it? The simple answer is that I’m not sure. It’s nice that it’s offloaded to another provider that keeps it going without my intervention, and that after setting it up it won’t change. But at the same time, it’s another monthly expense. I’ve already put money into a server, and the extra load of a redirection config is negligible thanks to Ansible. There is some overhead with creating & managing Let’s Encrypt certs, particularly with the rate-limiting, and these redirects are entire subdomains, but I just set up an Ansible Let’s Encrypt playbook to run weekly, and the certs will be kept up to date.

It’s going to come down to how much I get billed for this.

No Comments

Cheaping out on EC2 – using Spot Instances

Amazon markets Spot Instances as a way to reduce the price you pay for instances. So, continuing my efforts to reduce expenses on EC2, I looked into using spot instances. Spot instances are essentially just like normal instances. You can create your own AMIs, where you essentially create an image and tell Amazon to create instances based on that image, or use an existing AMI.

If you want to create an AMI, get a starting image, and customize it as necessary. I started with the Fedora 17 image. In an attempt to reduce the cost, I resized the disk from 10GB to 2GB, installed vim, less, screen and rsync, which oddly aren’t in the default Fedora install.

I then had to package it as a new AMI – this created an EBS snapshot, so I’m happy that I resized the disk. It’s a bit annoying that you’re going to be paying for an EBS snapshot AND the active EBS volumes, but in virtually all cases, the cost of the EBS snapshot won’t exceed the amount saved by using spot instances. If you have a bigger snapshot, it’ll cost more of course, but then you’d likely be using a more expensive EC2 instance, so the cost should balance out in the long run.

As for actually using the spot instance, I had my AMI set up to automatically start an IRC bot, so I used this for timing. The IRC bot came online ~7 minutes after I submitted the request to start the spot instance, so there’s a bit of lead time, but not too much. Because of the lead time, the instance won’t appear in the instance list for a while.

And an extra tip: Don’t be like me and not realise the spot instance actually started, and leave it running for two months racking up charges, only to be notified by Amazon that you now owe them money after your credit runs out. (Thankfully, they waived the charges as a one time thing.)

So now by default I set an expiry time of a day on all my spot instance requests if I know I’m only going to have them up for a few hours.

And one thing to look at if you require access to your data and can get by with using a pre-created image is using instance stores and mounting EBS volumes with the API. I didn’t try it because apparently, the t1.micro size that I’m using doesn’t support instance stores. Of course, this only really makes sense if you don’t want to pay the cost of having the spot instance run off an EBS volume. For a large scale operation, could be worth it.

, , ,

No Comments

Cheaping out on EC2 – Downsizing existing EBS volumes

I’m moving a VM from my own server to EC2. I’ve got $50 worth of credit from a Redhat event, so I would like to make use of that. (That, and my AWS account existed before the free tier was introduced, so I’m not eligible for that.) That said, I want to make that credit last as long as possible. There were two parts to this – resizing the EBS volumes so I don’t get dinged for more than I need.

Amazon says $0.10 per GB-month of provisioned storage – so I’d be paying ~80 cents/month for storage that I’m not using. Admittedly, it isn’t that much, but that’s where I looked first.

Read the rest of this entry »

, , ,