Archive for category Sysadmin

Terraform import with AWS profiles other than default

I’ll come back and clean this up, but for now:

Undocumented: It will use the default AWS profile – it will pull in your shared credientials, and use the default values if specified.

As per code, use AWS_PROFILE=<name> terraform import aws_db_instance.default <id> to import using a AWS profile that isn’t default.

ELB holds onto subnets that are to be destroyed. Combined with new subnet having the same CIDR, can’t create new subnet because CIDR is in use, can’t update ELB because new subnet isn’t created yet.

No Comments

Quick and Dirty Shoestring Startup Infra

At the University of Waterloo, we have a Final Year Design Project/Capstone project. My group is working on a conference management suite called Calligre. We’ve been approaching it as kind of a startup – we presented a pitch at a competition and won! While sorting out admin details with the judges after, they were oddly impressed that we had email forwarding for all the group members at our domain. Apparently it’s pretty unique.

In the interest of documenting everything I did both for myself, and other people to refer to, I decided to write down everything that I did.

Note that we’re students, so we get a bunch of discounts, most notably the Github Student Pack. If you’re a student, go get it.

Domain

  1. Purchase a domain. NameSilo is my go-to domain purchaser because they have free WHOIS protection, and some of the cheapest prices I’ve seen.
    Alternatives to NameSilo include NameCheap and Gandi. Of the two, I prefer Gandi, since they don’t have weird fees, but Namecheap periodically has really good promos on that drop a price significantly.
  2. Use a proper DNS server. Sign up for the CloudFlare free plan – not on your personal account, but a completely new one. CloudFlare doesn’t have account sharing for the free plan yet, so Kevin and I are just using LastPass to securely share the password to the account. For bonus points, hook CloudFlare up to Terraform and use Terraform to manage DNS settings.
    Alternatives include DNSimple (2 years free in the GitHub Student Pack!) and AWS Route 53.
  3. Sign up for Mailgun – they allow you to send 10000 messages/month for free. However, if you sign up with a partner (eg Google), they’ll bump that limit for you. We’re sitting at 30000 emails/month, though we needed to provide a credit card to verify that we were real people.
    Follow the setup instructions to verify your domain, but also follow the instructions to recieve inbound email. This allows you to setup routes.
    Alternatives include Mailjet (25k/month through Google), and SendGrid (15k emails/month through the Github Student Pack) – though SendGrid doesn’t appear to do email forwarding, they will happily take incoming emails and post them to a webhook
  4. Once you have domains verified and email setup, activate email forwarding. Mailgun calls this “Routes”. We created an address for each member of the team, as well as contact/admin aliases that forward to people as appropriate. I recommend keeping this list short – you’ll be managing it manually.

Hosting

  1. We currently have a basic landing page. This is still in active development, so we use GitHub Pages in conjunction with a custom domain setup until it’s done. This will eventually be moved to S3/another static site host. For now though, it’s free.
  2. Sign up for a new AWS account.
  3. Register for AWS Educate using the link in the Github Student Pack. This gets you $50 worth of credit (base $35 + extra $15). Good news for uWaterloo people: I’ve asked to get uWaterloo added as a member institution to AWS Educate, so we should be getting an additional $65 of credit when that happens.
    Note that if you just need straight hosting, sign up for Digital Ocean as well – Student Pack has a code for $50/$100 of credit!

AWS

  1. In AWS, create an IAM User Account for each user. I also recommend using groups to assign permissions to users, so you don’t need to duplicate permissions for every single user. I just gave everyone in our team full admin access, we’ll see if that’s still a good idea 6 months down the road…
  2. Change the org alias so people don’t need to know the 12 digit account ID to login.
  3. Enable IAM access to the billing information, so you don’t have to flip between the root account & your personal IAM account whenever you want to check how much you’ve spent.
  4. Enable 2 factor auth on the root account at the very least. Let another person in the team add the account to their Google Authenticator/whatever, so you’re not screwed if you have your phone stolen/otherwise lose it.

More stuff as and when I think about it.

,

No Comments

Notes from various AWS Investigations

  • AWS CloudWatch Logs storage charge == S3 storage charge. Possibly less, since the logs are gziped level 6 first.
  • CW Logs makes more sense than using AWS Elasticsearch at small scale – prices start at 1.8c an hour + EBS charges vs 50c/GB of log ingestion + storage
  • For pure log storage & bulk retrival, S3 makes far more sense than either ElasticSearch or CloudWatch Logs. B2 is ~20% of S3 though, so they make even more sense.

  • DynamoDB streams are for watching what happens to a table, and they rotate every ~24 hours, so you’d get charged on a rolling basis, and can’t delete individual events. I’m assuming events don’t disappear once you’ve processed them.

  • Cert Manager is in more zones! But only makes a difference if you hang stuff in front of an ELB. Certs for CloudFront have to still go through us-east-1.
  • API Gateway has direct integration with DynamoDB, doing an end run around Lambda functions that just insert & retrieve records (aws.amazon.com/blogs/compute/using-amazon-api-gateway-as-a-proxy-for-dynamodb/) Amusingly, models continue to not be used. (I still don’t understand what Models are supposed to do/enforce)
  • DynamoDB cross-region replication is weird. You spin up an EC2 instance that handles it for you. I wonder if the DynamoDB team will work on managed replication…
  • DynamoDB is stupid cheap, and it makes sense for me to migrate the vast majority of my DB centric stuff to it.
  • CloudFront has a weird “$0.000 per request – HTTP or HTTPS under the global monthly free tier” for requests, and I’m not sure why. My account is long out of the free tier.

No Comments

Improving my OpenVPN Ansible Playbook

I had a working OpenVPN configuration. But it wasn’t the best it could be. The manpage for OpenVPN 2.3 (community.openvpn.net/openvpn/wiki/Openvpn23ManPage) was used to find particularly interesting options.

For most of the changes I had to find examples and more information through Googling, though blog.g3rt.nl/openvpn-security-tips.html is of particular note for popping up very often.

Improving TLS Security

  1. Added auth SHA256 so MACs on the individual packets are done with SHA256 instead of SHA1.

  2. Added tls-version-min 1.2 to drop SSL3 + TLS v1.0 support. This breaks older clients (2.3.2+), but those updated versions have been out for a while.

  3. Restricted the tls-ciphers allowed to a subset of Mozilla’s modern cipher list + DHE for older clients. ECDSA support is included for when ECDSA keys can be used. I’m uncertain of the usefulness of the ECDHE ciphers, as both my client and server support it, but the RSA cipher that’s 3rd in the list is still used. Continuing to investigate this.

The last 2 changes are gated by the openvpn_use_modern_tls variable, which defaults to true.

  1. New keys are 2048 bit by default, downgraded from 4096 bit. This is based on Mozilla’s SSL guidance, combined with the expectation of being able to use ECDSA keys in a later revision of this playbook.

  2. As part of the move to 2048 bit keys, the 4096 bit DH parameters are no longer distributed. It was originally distributed since generating it took ~75 minutes, but the new 2048 bit parameters take considerably less time.

Adding Cert Validations

OpenVPN has at least two kinds of certification validation available: (Extended) Key Usage checks, and certificate content validation.

EKU

Previously only the client was verifying that the server cert had the correct usage, now the verification is bi-directional.
OpenVPN, more about EKU checks: 1 & 2

Certificate content

Added the ability to verify the common name that is part of each certificate. This required changing the common names that each certificate is generated with, which means that the ability to wipe out the existing keys was added as well.

The server verifies client names by looking at the common name prefix using verify-x509-name ... name-prefix, while the client checks the exact name provided by the server.

Again, both these changes are gated by a variable (openvpn_verify_cn). Because this requires rather large client changes, it is off by default.

Wiping out & reinstalling

Added the ability to wipe out & reinstall OpenVPN. Currently it leaves firewall rules behind, but other than that everything is removed.

Use ansible-playbook -v openvpn.yml --extra-vars="openvpn_uninstall=true" --tags uninstall to just run the uninstall portion.

Connect over IPv6

Previously, you had to explicitly use udp6 or tcp6 to use IPv6. OpenVPN isn’t dual stacked if you use plain udp/tcp, which results in being unable to connect to the OpenVPN server if it has an AAAA record, on your device has a functional IPv6 connection, since the client will choose which stack to use if you just use plain udp/tcp.

Since this playbook is only on Linux, which supports IPv4 connections on IPv6 sockets, the server config is now IPv6 by default (github.com/OpenVPN/openvpn/blob/master/README.IPv6#L50), by means of using {{openvpn_proto}}6.

Hat tip to T-Mobile for revealing this problem with my setup.

To-do

  1. Add revoked cert check

  2. Generate ellptic curve keys instead of RSA keys However, as noted above, ECDHE ciphers don’t appear to be supported, so I’m not sure of OpenVPN will support EC keys.

  3. Add IPv6 within tunnel support (Possibly waiting for OpenVPN 2.4.0, since major changes are happening there)

This SO question seems to be my exact situation.

Both this SO question and another source are possibly related as well.

Tried splitting the assigned /64 subnet with:

ip -6 addr del 2607:5600:ae:ae::42/64 dev venet0
ip -6 addr add 2607:5600:ae:ae::42/65 dev venet0
  1. Investigate using openssl ca instead of openssl x509next version of easyrsa uses ca

, ,

No Comments

Using the Ansible Slurp module

I recently discovered the slurp module within Ansible when I was attempting to find new modules in Ansible 2.0. It is particularly interesting for me since I’ve been doing a bunch of stuff involving the contents of files on remote nodes for my OpenVPN playbook. So I decided to try using it in one of my latest playbooks and see how much better it is than doing command: cat <file>.

Using it

My usecase for slurp was checking if a newly bootstrapped host was Fedora 22, and upgrading it to Fedora 23 if it was. The problem in this case was that recent versions of Fedora don’t come with Python 2, so we can’t use gather facts to find the version of Fedora (and need to install Python2 before we do anything).

The suggested method was to install python using the raw command, and then run the setup module to make the facts available.

But I was going to reboot the node right after the install in any case, so I didn’t feel like running the full setup module, so this was a perfect place to try the slurp module.

Using it is simple – there’s only one parameter: src, the file you want to get the contents of.

Similarly, using the results is also simple, with one exception: The content of the file is base64 encoded, so it must be decoded before use. Thankfully, Ansible/Jinja2 provides the b64decode filter to easily get the contents into a usable form.

My final playbook ended up looking something like this:

gather_facts: no
tasks:
    - name: install packages for ansible support
      raw: dnf -y -e0 -d0 install python python-dnf
    - name: Check for Fedora 22
      slurp:
        src: /etc/fedora-release
      register: fedora
    - name: Upgrade to Fedora 23
      command: dnf -y -e0 -d0 --releasever 23 distro-sync
      when: '"Fedora release 22" in fedora.content|b64decode'

Functionally, it’s pretty much identical to using the old command: cat <file> , register, and when: xyz in cmd.stdout style to get & use the contents of files. All of those elements are still there, just renamed at most – register is still being used unmodified.

The fact that I’m using a dedicated module for it though makes my playbook look a lot more Ansible-ish, which is something I like. (And the fact I don’t need to have a changed_when entry is a strong plus for code cleanliness.)

No Comments

Backing up & restoring Jenkins

I’m moving my jenkins instance to a new server, which means meaning up & restoring it.

Backup

The nice thing about it is that it’s almost entirely self-contained in /var/lib/jenkins, which means I really only have 1 directory to backup.

I’m using duply to back the folder up – but it’s 1.9GB in size. So to save space & bandwidth, I’m going to exclude certain files. This is the content of my /etc/duply/jenkins/exclude file:

**/*.rpm
**/plugins/*/
**/plugins/*.jpi
**/plugins/*.bak
**/workspace
**/.jenkins/war

The main thing I’m excluding is the build artifacts – because I’m building RPMs, the SRPMS are rather large (nginx-pagespeed SRPMs weigh in at 110+MB), so I exclude all files ending in .rpm.

Next, I’m excluding most of the stuff in the plugin folder. My reasoning behind this is that the plugins themselves are downloadable. However, Jenkins disables plugins/pins plugins to the currently installed version by creating empty files of the form <plugin>.jpi.disabled/<plugin>.jpi.pinned. I want these settings to carry over between versions. Unfortunately trying + **/plugins/*.jpi.pinned showed that everything else got removed from the backup. I’m assuming this is due to the use of an inclusive rule, so the default include got changed to default exclude.

In any case, I end up explicitly excluding things I don’t care about, which is probably good if something that I might need ever ends up in the plugins folder.

I also exclude workspace because everything can be recreated by building from specific git commits if need be. The job information is logged in jobs/, so I can easily find past commits even though the workspace itself no longer exists.

Finally, I also exclude the jenkins war folder. I believe that this is an unpacked version of the .war file that gets installed to /usr/lib/jenkins. It seems to get created when I start jenkins itself.

With just these 6 excludes, I’ve dropped the backup archive size down to <5 MB, which is a big win.

Normally I’d just take a live backup while Jenkins is running, but in this case where I’m moving servers, I completely shutdown Jenkins first, before taking a final backup with duply jenkins full.

Restoring

For the restore, I first installed Jenkins using a Jenkins playbook from Ansible Galaxy. It’s fairly barebones, but it works well – and I don’t need to spend time developing my own playbook. I also installed duply, and I manually installed an updated version of duplicity for CentOS 7 from koji to get the latest fixes.

Once I got duply set back up, I restored all the files to a new folder with duply jenkins restore /root/jenkins. I restored it to a separate folder because duply appears to remove the entire destination folder if it exists, and I wanted to merge the two folders.

After the restore was complete, I ran rsync -rtl --remove-source-files /tmp/jenkins/ /var/lib/jenkins to merge the restored data into the newly installed Jenkins instance.

At this point, everything should have worked, except I was unable to login. After spending some time fruitlessly searching Google, I ran chown -R jenkins:jenkins /var/lib/jenkins, as the rsync didn’t preserve the file owner when it created the new files. Luckily enough, that fixed the problem, and I could now login.

I then spent a few hours working all this into an Ansible playbook so future moves are much easier.

, ,

No Comments

Ansible: Using register with with_items

The motivation for this came from trying implement running a command that depended on whether or not a previous command succeeded.

In this case, I was trying to make the creation of duply profiles idempotent. Duply will exit with an error if you attempt to create a profile that already exists, and I didn’t want that to interrupt my playbook.

My first thought was “stat the folder & check if it exists”, which got my this:

- name: Check if profile exists
stat: >
path={{duply_path}}/{{item.key}}/
with_dict: "{{folders}}"
register: profile
- name: Create duply profiles
command: duply {{item.0}} create
with_together:
- "{{folders}}"
- profile.results
when: item.1.stat.isdir is not defined

I semi-abused the with_together operator to link both my dict of folders & the results of the stat on each of the folders.

Turns out each element in the result from the stat call also contains a copy of the item it was called with, which meant my code could be reduced to

- name: Check if profile exists
stat: >
path={{duply_path}}/{{item.key}}/
with_dict: "{{folders}}"
register: profile
- name: Create duply profiles
command: duply {{item.item.key}} create
with_items: profile.results
when: item.stat.isdir is not defined

The key difference is that I used {{item.item}} to get access to the original item that was passed to the stat task.

Of course, it turns out that I don’t need the extra stat call, since I can just use the creates parameter to the command task, making my code even shorter:

- name: Create duply profiles
command: duply {{item.key}} create creates={{duply_path}}/{{item.key}}
with_dict: "{{folders}}"

I ended up dropping the whole register bit, but I wanted to note this down for the future, because it’s fairly interesting behaviour.

No Comments

Upgrading to Fedora 23 on OpenVZ

TL;DR: Run dnf --releasever 23 distro-sync instead of dnf system-upgrade on OpenVZ systems

I run Fedora on my servers almost exclusively. This means I usually fall behind in upgrading to the latest release, leading me to wonder why I don’t just go with the latest version of CentOS.

Then I have lovely cases where CentOS gets horribly outdated, and I remember why I like Fedora with its latest and greatest. (Yes I do like shiny things, thank you very much)

My servers are mostly OpenVZ based, for the simple fact that OpenVZ powered VPSes are rather cheap for what you get, especially where I don’t need high performance. I have just one bad thing about being OpenVZ based: I have no control over the kernel/boot sequence. The vast majority of the time, this isn’t an issue. Sadly, using dnf system-upgrade is one of the times when it is an issue.

Fedora 22 brought in a new way to upgrade your system – dnf system-upgrade. I’ve used it on my laptop, it’s pretty good compared to fedup and past solutions. However, the one thing that rarely failed me in the past was using the yum distro-sync functionality. (The only time I’ve had an issue with it was when the upgrade was stopped midway, but that’s another story.)

Read the rest of this entry »

, ,

No Comments

Let’s not Encrypt on CentOS5

TL;DR – Let’s Encrypt requires a newer version of OpenSSL than CentOS 5 has installed. Unless you want to pass around with compiling OpenSSL yourself, don’t try it.

When your friend will upgrade his CentOS 5 system "someday"

When your friend will upgrade his CentOS 5 system “someday”

Read the rest of this entry »

,

8 Comments

Let’s Encrypt ALL THE THINGS

Got my first domain using a cert from Let’s Encrypt in under ~10 minutes, including setting up Let’s Encrypt itself. Yes, this is rather game changing.

Now to write ansible playbooks around it, and figure out how to get it working for proxied domains automatically.

Read the rest of this entry »

,

No Comments