A First Look at Google Cloud Datastore

Note: This is a continuation of my post on exploring Google Cloud Platform. There is also a mirror of it on dev.to.


At first glance, Datastore is equivalent to DynamoDB. I personally think that Datastore is better compared to SimpleDB. Unfortunately, SimpleDB isn’t accepting new customers/isn’t being deployed to new regions, so it’s not a good comparison.

That said, the fundamental idea behind Datastore is the same (Hosted NoSQL database), but how it’s implemented is very different.

App Engine Datastore vs Cloud Datastore?

At one point, Cloud Datastore was (is?) part of App Engine, but it’s since been split out. Presumably as part of this legacy, Datastore appears limited to the regions that App Engine is in, which unfortunately isn’t all of Google’s regions.

Additionally, an App Engine account is created for Cloud Datastore. It’s required if you use the Datastore SDKs as well. Why this dependency exists, and why it’s exposed are open questions.

Even more confusing is that the docs for App Engine Datastore list DB Datastore as superseded, but then link to docs about Cloud Datastore. App Engine also mentions a NDB Client Library, which as far as I can tell wraps the actual Cloud Datastore service, but is specific to App Engine. There is also at least one more article that treats Cloud Datastore and the DB/NDB libraries as separate things.

The only thing I can suggest is check the URL, make sure the docs you’re reading start with https://cloud.google.com/datastore/.

Pros:
1. SQL-like semantics (transactions!)
2. More granular breakdowns for multi-tenancy: namespaces/’Kind’/’ancestor path’ (Google says a Kind is functionally equivalent to a table). I’m not sure about the usefulness of the namespace/kind distinction, but it’s an extra way to get multi-tenancy and is ignored by default, so meh.
3. Per request pricing! DynamoDB is charged at what you’re expected to use, not what you actually use. Given AWS’s obsessive focus on “pay what you use”, Dynamo’s provisioned read/write units are odd
4. Automatic indexes for every property enables arbitrary querying, not AWS’s you must define any indexes you want
5. A dashboard that allows SQL-like queries to be run (but only SELECT queries)

Cons:
1. Nothing like DynamoDB streams (which are awesome for replication/async actions that are implicitly triggered off a data change)
2. Dynamo has 25x the storage on the free tier compared to Datastore (25GB vs 1GB)
3. Dynamo offers more total read/write operations per day – good if you have a consistent request rate, bad if you have bursts
4. Index (created by default, you have to opt out) data storage seems to be charged for
5. Creating a custom index requires the use of the gcloud CLI tool. There is no mention of any other method in the index documentation.
6. If you have a query that involves filtering on more than one property, you might run into a situation that isn’t covered by the built-in indexes or is otherwise impacted by one of a decently long list of query restrictions.

While you could get away with doing a scan + filter combination in Dynamo, GQL will reject you with a "Your Datastore does not have the composite index (developer-supplied) required for this query." error. (My usecase was select * from kind where property1 < value order by property2.)

I haven’t found a way to get Datastore to scan and filter server side, so I have to iterate over everything and throw away data that I don’t want – after retrieving it.

Pricing

A bit more about the price, because the pricing models of the two products are really different.

Dynamo’s pricing model makes sense if you’re doing a fairly consistent number of requests per second. Dynamo attempts to support bursting, but they do it by having a 300 sec bucket of * provisioned but unused read/write capacity, and bursting out of that. When you exhaust the bucket, requests are denied.

So if you’re trying to save money, and drop your read/write units to 1, and you do something request heavy, you’re going to have a bad time unless you increase the units before running your operation. Dynamo’s new auto scaling feature takes some time to kick in as well (the scale up alarms take 5 minutes to kick in – the CloudWatch alarm is set on ConsumedWriteCapacityUnits > NN for 5 minutes).

In contrast, Datastore’s charge-per-request model fits dramatically varying traffic patterns better, mainly because you’re not paying for capacity that sits unused.

If you’re doing any sort of table scanning in Dynamo to find elements by properties, or you have indexes on single properties, chances are Datastore will work better for you by virtue of the built-in-by-default indexes. You can get the same functionality out of Dynamo, but it’s harder to set up, and functions as (and is charged as) a separate table.

If you have composite (multi-property) indexes, that’s a bit more complicated. Datastore does a far better job of hiding the index complexity (once it’s set up) and actually using the indexes. But the setup process is hit or miss, requiring you to know in advance things like sort orders.

If you’re not doing anything fancy, and just accessing everything directly by key, Dynamo is better for small scale stuff by virtue of the massively greater free storage space (25GB vs 1GB).

, , ,

No Comments

A First Look at Google Cloud Storage

Note: This is a continuation of my post on exploring Google Cloud Platform. There is also a mirror of it on dev.to.


Storage price wise, S3 and GCS are mostly comparable, with a note that GCS bandwidth is more expensive, and you don’t really get a choice of what rate you want to pay.

GCS Single Region is pretty much directly equivalent to standard S3, and Nearline is equivalent to S3 Infrequent Access, complete with per GB retrieval fees. Storage for both classes is cheaper than the respective S3 classes. Operation fees are exactly the same though.

There’s two other storage classes though:

Multi-region Buckets

One major feature is multi-region buckets for $0.006/GB more. Presumably built to avoid incidents like us-~~tirefire~~-east-1 falling over and your buckets disappearing, it’s a definite point if you need high availability (and are willing to trust that Google has proper HA).

Assuming single region stores 3 copies, I speculate that for the price, multi-region stores 2 copies in each region for a total of 4 copies. That (suspected) single additional copy would be why multi-region buckets are only a third more expensive.

Coldline

Coldline is a bit more interesting. As the equivalent of Glacier, it has one big advantage – quick retrieval times (on the order of a second), much better than Glacier. The downside is the fixed retrieval costs – $0.05/GB. Just comparing Coldline to Nearline means you shouldn’t retrieve anything more than once every 2 years(!) if you want to save money compared to Nearline. (Nearline is $0.01/GB, Coldline is $0.007/GB, so you save $0.003/GB/month going for Coldline.)

In comparison, Glacier has a number of retrieval speeds and corresponding retrieval pricing. Even Glacier’s expedited requests (the most expensive option) cost 40% less than Coldline. (Interestingly, the bulk retrieval option works out to about 25% of retrieving from S3 IA, but takes in the order of 8-12 hours.)

Considering Glacier costs almost 50% less than Coldline, I really question the utility of Coldline. With the restore costs it only makes sense if you’re storing archival data that doesn’t need to be accessed for around 2 years. I think Glacier has a much better handle on the expected use cases here.

I can see Nearline being used (and people having the expectation of immediate access), but the retrieval price of Coldline with no way to change that makes me very leery of using it.

, , ,

No Comments

A First Look at Google Compute Engine

Note: This is a continuation of my post on exploring Google Cloud Platform. There is also a mirror of it on dev.to.


Compute Engine is one of the more important services for me. While I’d love to have all my stuff on managed services like Lambda/Cloud Functions, it’s not possible. The fundamentals of GCE are pretty similar to EC2, but there’s a few interesting features that would be interesting to see EC2 adopt (if ever).

Pros:
1. The GCE console supports SSHing to an instance within your browser. There’s also automatic auto SSH key adding, something which had annoyed people (me) using EC2. You can still have a master SSH keypair, but you can also add keypairs for individual users.
2. Linux OSes (Debian, CentOS, etc) are natively supported, not “support for Amazon Linux, and maybe other distros”. This is pretty much required for some of the stuff that GCE is offering. The auto-key pair adding? Done by a daemon waiting for instructions. Said daemon has been added to the OS for you.
3. Custom machines types (mix and match memory and CPU cores). I’d love to see EC2 adopt this, but it’s not going to happen any time soon.
4. Sustained usage discounts: No reserved instances required for discounts
5. Premptible instances have a 24 hour limit, unlike EC2’s spot block of max 6 hours. Having a fixed price means you don’t have to worry about bill shock. Tradeoff is that there’s no spot instance equivalent, where if your bid is high enough, the instance will practically never be terminated.
6. Live migration for maintenance events! It’s coming to EC2 (it’s in the Xen mainline), just a question of when.

Cons:
1. No security groups equivalent
2. Firewall rules applied to the entire VPC, or specific labelled instances (I suppose you could bludgeon this into security groups).
3. No comparison graphing. Are they depending on external providers (ie Datadog) like they do for sending email? (Yes, the creators of GMail recommend using another party to send email…)
4. The bandwidth out prices. Google might have a super awesome network, with private backhaul to endpoints that are close… but that is more expensive than AWS.

Also, it’s subjective, but I find that the GCE console UI is a lot cleaner than EC2’s.

, ,

No Comments

A First Look at Google Cloud Platform

This got really long, so I broke it up into parts. Feel free to skip to the parts you’re interested in.
* Setting up, Organization differences & IAM/Authentication (this post) (dev.to mirror)
* Google Compute Engine (dev.to mirror)
* Google Cloud Storage (dev.to mirror)
* Google Cloud Datastore (dev.to mirror)


I have a bunch of experience with AWS (Disclaimer: I worked there). My AWS account dates back to 2010, but I’ve only really started using AWS heavily in the last 2 years.

I’ve been speccing out and performing cost estimations for a new project recently, and with the introduction of the GCP free tier and the $300 of credit, I decided to look into some of the services GCP offers to see how it compares to AWS.

Google has a comparison between AWS and GCP, which is useful but pretty dry. I decided to just dive in and experiment – that $300 of credit means I’m pretty safe!

Registering for GCP

It was a matter of going to the GCP console and logging in with my Google account. I had to sign up for the free trial and provide my credit card details, but that was it. Compared to the AWS signup process, this was a lot simpler.

However, it’s simple because Google has effectively split the account verification steps – I used my gmail account, which was already verified. A side effect of this is created resources are associated with this account by default. An AWS account is trivial to transfer – update the email address and be done with it. My Google account? Less easily transferred, but that brings me to the first major difference.

AWS Accounts vs GCP Projects

Google doesn’t really expound on the account/project difference, merely saying this in their comparison:

Cloud Platform groups your service usage by project rather than by account. In this model, you can create multiple, wholly separate projects under the same account. In an organizational setting, this model can be advantageous, allowing you to create project spaces for separate divisions or groups within your company.

In practice, it’s an entirely different way of handling resources. If you wanted to run something in its own isolated silo in AWS, you would generally create an entirely separate account, and use consolidated billing/AWS Organizations (which is a whole other set of problems). In GCP, each project is its own little silo, with no communication between projects by default.

After getting used to AWS (and using AWS Organizations to handle the account-per-project), this is a very different way of thinking. To me, there are two main benefits.

The first is that switching ownership of resources is incredibly simple – assign a new project owner, and remove the existing owner, and it’s done. What’s most impressive is that (as far as I can tell), the transfer will be done without interrupting anything currently running. Compute Engine instances will continue to run, Cloud Storage buckets don’t need to have contents copied out, the bucket deleted, then recreated in the new account, hoping that no one else steals the bucket name in the meantime.

The second benefit is that segmentation of projects is far easier. You don’t have to have the equivalent of an AWS account per project if you want separation for security.

A nerfed IAM?

The downside of separation by project is that GCP seems not to have an active equivalent of AWS IAM’s ability to restrict access to individual resources. The GCP documentation explicitly calls this out:

Permissions are project-wide, which means that developers in a project can modify and deploy the application and the other Cloud Platform services and resources associated with that project.

I am conflicted over this situation. Best practice says that accounts/roles should have the fewest permissions possible. I try to lock down my IAM policies to specific resources wherever possible. For example, a user can only interact with a single SQS queue because I restrict the attached IAM policy by queue name.

On GCP, it’s all or nothing within a project. I have to allow access to all PubSub topics if I want to allow access to one.

Now, what actually happens is that people can and do liberally use * in their IAM policies in AWS, so Google’s just making it really easy to get up and running.

But the fact that the restrictions aren’t available by default are worrying, especially for large companies that do have to capability to manage IAM policies (and not operate accounts per service).

I think Google’s realised this, and is extending IAM (still in Alpha) to allow permissions to be defined on individual resources where supported (eg PubSub, Datastore). It looks like it’s possible to use the IAM API to define custom roles, but I haven’t successfully done so. I just ended up using project isolation, which works, but feels bad.

Authentication

GCP has a greater variety of ways to authenticate with their APIs compared to AWS.

  • Compute Engine/App Engine work with IAM and get credentials, much like EC2’s instance roles. These are limited to individual projects.
  • Developers using the gcloud CLI can authenticate using OAuth2, and switch between projects.
  • Non-interactive systems outside GCP use a service account that’s tied to a specific project.

Using the SDK requires creating a service account, which generates a JSON file (or PKCS12, but let’s ignore that). The easiest way is to use an environment variable GOOGLE_APPLICATION_CREDENTIALS to set the location of the file when using the SDK, and let the SDK handle everything.

You can define the file location in code, like Boto. (And presumably other AWS SDKs, I’ve only really used the Python version.)

You can also go full OAuth, provide a list of scopes and walk through the OAuth process, but … no.

Compared to AWS’s simple “IAM users get a Access Key & Secret Key”, it seems rather overcomplicated. Thankfully reasonable defaults are used.

Amusingly, GCP isn’t as agnostic as AWS is – references to G Suite sometimes appear. Not everyone is designing applications to be run on G Suite, so I have a feeling it’s just old documentation.

, ,

No Comments

Moving to the US

I’m moving to the US, and as part of that I’m wrapping up a host of legal things. I have some of this documented for my own benefit, and some friends are talking about similar things, so I decided to just write it all down.

Disclaimer first: I am not a lawyer, do not take this as legal advice, your mileage may vary, I’m talking about my position, not yours, etc etc

Immigration

As a Canadian citizen, I’ll be entering the US on a TN visa. TN is granted at the border, or you can mail in the I-129 form.

I renewed my passport early (more than 12 months before it would normally expire), citing the 3 year period of the TN visa as the reason in the passport application.

My company is handling the TN visa application, I’ll know more once the paperwork goes through.

Taxes in Canada

The CRA has an excellent section on emigrating from Canada.

If you want to be entirely through, the CRA has a form to determine what you should file as – emigrant/deemed resident. I don’t meet any of the deemed resident requirements, so I’ll be considered an emigrant.

I will be filing Form T-1161 along with my tax return for 2017, because becoming an emigrant means that my property undergoes a deemed disposition. Essentially, the CRA will consider all my property to be sold and repurchased at market value, so any capital gains can be taxed.

Note: One of my friends brought up that Canadian deemed residency includes a “183 day rule”, which says that you’re a resident of Canada if you’ve been in Canada for 183 days or more in a calendar year. He was concerned because it could mean that you file as a tax resident of both Canada and the US if you start your job late enough.

In my (see disclaimer) reading of it, it only matters if you’re either entering Canada, or visiting Canada. Since you’re leaving Canada, you’re going to be taxed on all your income prior to leaving Canada, and any income from Canada after you leave. Essentially, the 183 day rule doesn’t apply in this case.

If you want to be perfectly clear on your tax status, ask the CRA what they think of your situation by filing Form NR73.

Taxes in the US

I’m probably going to pay someone to do my taxes, at least for the first year when I move. I believe I would be considered a Dual Status Resident Alien.

The IRS does not like TFSAs (or RESPs, but recent grads are unlikely to have those). I’ve moved everything that I had in a TFSA into a RRSP, which the IRS likes a lot better.

Keeping my Canadian accounts means that additional forms (I know of Form 8938 and the FBAR form with the US Treasury) and will need to be filed for future tax years, which is always fun.

As a side note, California taxes Canadian RRSPs, so it’s a good thing I’m going to Seattle.

1 Comment

Exploring AWS Organizations

When AWS Organizations went GA, I was really happy. While I’ve had my personal AWS account for a while, I have a bunch of sites that aren’t personal in nature, and I wanted to spin them off into another account with the intention of having enough isolation that I could apply Terraform to those accounts freely.

The Good Parts

Inviting an existing account was painless

I performed this with the other account open in an incognito session. It was a simple matter to send the invitation from one account, and the offer showed up in under a minute in the Organizations tab in the other account.

The entire process took under 5 minutes.

Cross Account Access was set up automatically

I’ve spent a bunch of time setting up role switching. I really appreciate the automatic setup. (Though see Nit #2.)

Creating accounts is super simple

It’s 3 fields on the UI, only 2 are compulsory. The vast majority of information on the billing panel (Address, etc) is automatically copied from the root account.

SCPs are powerful

I blocked Route 53 as an experiment. After I applied the SCP, I can’t do anything in the R53 console:

Update: I had a nit about SCP policy inheritance being confusing – it was actually a misunderstanding about how policies are applied. I applied a restrictive policy to the root OU, expecting that it would result in default deny for accounts, but attaching a separate policy would override the restrictive policy. Unfortunately, the restrictive policy blocked me from being able to do anything, even on accounts with a FullAWSAcess policy attached.

This is a security feature – child nodes can’t be allowed to set a more permissive policy than what a (grand)parent specifies. So even if the child has the FullAWSAccess policy applied (like I had), it’s ineffective because my root said “Only have access to read-only billing details”, and the intersection of allowed permissions was “read-only access to billing data”.

As such, the root policy that should be something that either allows or denies something that you want applied to all accounts in your organization (except the master account). In practice I think people are going to end up leaving the FullAWSAccess policy on the Root OU, and apply more selective policies to nodes.

The Less Good Parts/Nits

The process wasn’t as pain free as hoped, here’s some rough edges I found, pretty much all issues I found on first use, and avoided once I knew about them:

1. The progress of account creation through the console isn’t clear.

Specifically, after clicking the “Create” button on the Add Account page, I get redirected back to the account list, and there’s a new row that just says “Creating…”.

There doesn’t appear to be an automatic refresh, so I manually refreshed. Now the “creating” message has disappeared, and a new account with an empty name and an account number appears in the account list.

Refresh a few more times, and the name eventually appears. I’m guessing the name field is only populated after the account is set up on the backend.

AWS: It’d be nice if the newly created account was somehow marked as “creation in progress”.

2. It is unclear how to get access to the newly created account.

AWS Orgs is probably aimed at a more experienced subset of AWS users, so I’d imagine they’d know how to do this. But it would be nice to explicitly mention that you have to role-switch to the named role that was created as part of the account. Something like “To access the new account, you need to switch role from your master account, using the account number and the name of the IAM role you provided when the account was created.”

You (also) don’t appear to be able to role switch to it as soon as the account number is listed in the account list. You have to wait until the account name shows up in the account list before you can role switch to the new account.

AWS: Also, the flow of having to go out of the Org page back to the AWS console to set up the switching seems easily optimized – a “Switch Role” button next to each account would be helpful.

3. What’s with all the emails?

I’m unsure if this will be fixed – right now it appears that Orgs is hooking onto the normal creation process. This means for every account you create you get “Welcome to Amazon Web Services” and “Your AWS Account is Ready – Get Started Now” emails sent to the new account’s email address.

4. The Auto-Created Role

First off – The role created by Organizations uses an inline AdministratorAccess policy, not the AWS managed one. Considering it has the same name (and actual Policy details) as the managed policy, I’m not sure why an inline policy was used. For what it’s worth, I tried swapping it out on my test account, and there were no adverse effects.

Secondly, what happens if I create an account, but don’t specify a role name? I’m guessing an account gets created without the cross account access setup, and I have to do the forgot-password dance on the email address I created the account with to get access. I’ve verified that you can do the forget password dance to set a password for the root account..

5. You can’t delete an account

I jumped in head first and created a (badly named) testing account before realizing this: Deleting accounts is (currently) not possible. The feature might be coming though! The “Remove Account” button is misleading – it will happily allow you to attempt to remove a created account, only for you to get an error.

The Organizations backend clearly knows which account is a created account (the status column shows “Created” vs “Joined”), so the “Remove Account” button could be disabled when a created account is selected. This is already done for the Master Account, so it might be simple to extend to created accounts. Or relabel the button “Remove Invited Account” to be super clear.

As a side note, I might be able to close a created account through the Billing panel (there’s a “Close Account” checkbox), but I’m not sure what would happen to my master account – and I’m kind of attached to it, so I’m not going to risk it.

Related, I can’t delete an account that failed to be created.

I managed to do this accidentally – see nit #1. I thought the initial creation had failed, and I tried to create the account again. The second request got accepted by Organizations, only to eventually error out because I reused the account email address.

6. A grab bag of other stuff

Three small things I encountered that didn’t really deserve their own section.

  • Orgs is keeping (caching?) their own copy of the Account Name. I renamed an account in the AWS Console, but the change didn’t replicate back to the Org account list. So there might be name drift if you rename accounts (there appears to be no way to rename an account through Organizations.)
  • It is unclear what happens if I update the details in the master account (eg the address). I’m guessing it’s not replicated to the other accounts.
  • The default VPC in my master account is 172.30.0.0/16. The default VPC in a created account is 172.31.0.0/16 – but for every created account. I’m not sure if this is accidental or intended, but the result is I can peer the default VPC in my root account with a default VPC in one of the created accounts. If it is intended, would it be possible to look at current VPCs and select a CIDR that isn’t in use?

7. SCPs are confusing

My original point here was the result of misunderstanding how the policies are applied. Only thing I’d suggest is add an “Effective Policy” listing under the SCP entry for each node.

SCP wasn’t enabled by default, despite my selecting Enable All Features in Settings. There’s a second step I missed: you have to enable SCPs on the root OU, not the master account – something which is documented, but I completely missed.

Tip: After navigating to the “Organize Accounts” tab, click “Home” to see the top level OU. The list of your acccounts and OUs is not the top level of your organization.

Also, having an SCP of FullAWSAccess appears to be required on the root OU. I removed it (after creating a stub policy, because at least one policy is required to be attached to the root), and promptly lost the ability to do anything in a cross-account role, even though those accounts still had the FullAWSAccess policy attached to each account individually:

To be clear, default allow is the sane default – the intention here is probably “You should be putting accounts in OUs and black/whitelisting on that”. But being unable to use role switching after the SCP on the root OU was weird.

I’m assuming there’s a combination of resources and permissions that I can add to an SCP that would allow the cross account stuff and nothing more. It’d be nice if AWS provided this as a reference.

That said

It could have gone far worse. The invite process was smooth for existing accounts, and Orgs is doing what is says it will, with some exceptions on the UI side of things.

I’m very sure they’ll iterate and improve it. I’m personally hoping there’s going to be a way to move resources between accounts. Being able to move S3 buckets to another account alone would be amazing.

,

1 Comment

Terraform import with AWS profiles other than default

I’ll come back and clean this up, but for now:

Undocumented: It will use the default AWS profile – it will pull in your shared credientials, and use the default values if specified.

As per code, use AWS_PROFILE=<name> terraform import aws_db_instance.default <id> to import using a AWS profile that isn’t default.

ELB holds onto subnets that are to be destroyed. Combined with new subnet having the same CIDR, can’t create new subnet because CIDR is in use, can’t update ELB because new subnet isn’t created yet.

No Comments

Printing the table structure of a SQLite Database in Android

I’m doing some Android app development, and as part of it, I hit some issues with the database.

My first plan was to download the database and open it in SQLite, but having to re-establish my ADB debug session each time I downoaded the file got annoying, so I decided to write a short snippet which dumps the name & CREATE TABLE schema of each table to the log:

SQLiteDatabase db = YourDbHelper.getWritableDatabase();
Log.d(this.getClass().getSimpleName(), db.getPath());
Cursor c = db.rawQuery("SELECT type, name, sql, tbl_name FROM sqlite_master", new String[]{});
c.moveToFirst();
while (!c.isAfterLast()){
        int count = c.getColumnCount();
        Log.d(this.getClass().getSimpleName(), Integer.toString(count));
        Log.d(this.getClass().getSimpleName(), c.getColumnNames().toString());
        for (int i = 0; i < count; i++){
                Log.d(this.getClass().getSimpleName(), c.getColumnName(i));
                Log.d(this.getClass().getSimpleName(), c.getString(i));
        }
        c.moveToNext();
}

This gives decently printed output for a quick and dirty script:

07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: name
07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: projects
07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: sql
07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: CREATE TABLE `projects` ( `_id` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, `name` TEXT, `avatar` TEXT, `short_description` TEXT, `description` TEXT)

No Comments

Quick and Dirty Shoestring Startup Infra

At the University of Waterloo, we have a Final Year Design Project/Capstone project. My group is working on a conference management suite called Calligre. We’ve been approaching it as kind of a startup – we presented a pitch at a competition and won! While sorting out admin details with the judges after, they were oddly impressed that we had email forwarding for all the group members at our domain. Apparently it’s pretty unique.

In the interest of documenting everything I did both for myself, and other people to refer to, I decided to write down everything that I did.

Note that we’re students, so we get a bunch of discounts, most notably the Github Student Pack. If you’re a student, go get it.

Domain

  1. Purchase a domain. NameSilo is my go-to domain purchaser because they have free WHOIS protection, and some of the cheapest prices I’ve seen.
    Alternatives to NameSilo include NameCheap and Gandi. Of the two, I prefer Gandi, since they don’t have weird fees, but Namecheap periodically has really good promos on that drop a price significantly.
  2. Use a proper DNS server. Sign up for the CloudFlare free plan – not on your personal account, but a completely new one. CloudFlare doesn’t have account sharing for the free plan yet, so Kevin and I are just using LastPass to securely share the password to the account. For bonus points, hook CloudFlare up to Terraform and use Terraform to manage DNS settings.
    Alternatives include DNSimple (2 years free in the GitHub Student Pack!) and AWS Route 53.
  3. Sign up for Mailgun – they allow you to send 10000 messages/month for free. However, if you sign up with a partner (eg Google), they’ll bump that limit for you. We’re sitting at 30000 emails/month, though we needed to provide a credit card to verify that we were real people.
    Follow the setup instructions to verify your domain, but also follow the instructions to recieve inbound email. This allows you to setup routes.
    Alternatives include Mailjet (25k/month through Google), and SendGrid (15k emails/month through the Github Student Pack) – though SendGrid doesn’t appear to do email forwarding, they will happily take incoming emails and post them to a webhook
  4. Once you have domains verified and email setup, activate email forwarding. Mailgun calls this “Routes”. We created an address for each member of the team, as well as contact/admin aliases that forward to people as appropriate. I recommend keeping this list short – you’ll be managing it manually.

Hosting

  1. We currently have a basic landing page. This is still in active development, so we use GitHub Pages in conjunction with a custom domain setup until it’s done. This will eventually be moved to S3/another static site host. For now though, it’s free.
  2. Sign up for a new AWS account.
  3. Register for AWS Educate using the link in the Github Student Pack. This gets you $50 worth of credit (base $35 + extra $15). Good news for uWaterloo people: I’ve asked to get uWaterloo added as a member institution to AWS Educate, so we should be getting an additional $65 of credit when that happens.
    Note that if you just need straight hosting, sign up for Digital Ocean as well – Student Pack has a code for $50/$100 of credit!

AWS

  1. In AWS, create an IAM User Account for each user. I also recommend using groups to assign permissions to users, so you don’t need to duplicate permissions for every single user. I just gave everyone in our team full admin access, we’ll see if that’s still a good idea 6 months down the road…
  2. Change the org alias so people don’t need to know the 12 digit account ID to login.
  3. Enable IAM access to the billing information, so you don’t have to flip between the root account & your personal IAM account whenever you want to check how much you’ve spent.
  4. Enable 2 factor auth on the root account at the very least. Let another person in the team add the account to their Google Authenticator/whatever, so you’re not screwed if you have your phone stolen/otherwise lose it.

More stuff as and when I think about it.

,

No Comments

Notes from various AWS Investigations

  • AWS CloudWatch Logs storage charge == S3 storage charge. Possibly less, since the logs are gziped level 6 first.
  • CW Logs makes more sense than using AWS Elasticsearch at small scale – prices start at 1.8c an hour + EBS charges vs 50c/GB of log ingestion + storage
  • For pure log storage & bulk retrival, S3 makes far more sense than either ElasticSearch or CloudWatch Logs. B2 is ~20% of S3 though, so they make even more sense.

  • DynamoDB streams are for watching what happens to a table, and they rotate every ~24 hours, so you’d get charged on a rolling basis, and can’t delete individual events. I’m assuming events don’t disappear once you’ve processed them.

  • Cert Manager is in more zones! But only makes a difference if you hang stuff in front of an ELB. Certs for CloudFront have to still go through us-east-1.
  • API Gateway has direct integration with DynamoDB, doing an end run around Lambda functions that just insert & retrieve records (aws.amazon.com/blogs/compute/using-amazon-api-gateway-as-a-proxy-for-dynamodb/) Amusingly, models continue to not be used. (I still don’t understand what Models are supposed to do/enforce)
  • DynamoDB cross-region replication is weird. You spin up an EC2 instance that handles it for you. I wonder if the DynamoDB team will work on managed replication…
  • DynamoDB is stupid cheap, and it makes sense for me to migrate the vast majority of my DB centric stuff to it.
  • CloudFront has a weird “$0.000 per request – HTTP or HTTPS under the global monthly free tier” for requests, and I’m not sure why. My account is long out of the free tier.

No Comments