Archive for category Programming

Printing the table structure of a SQLite Database in Android

I’m doing some Android app development, and as part of it, I hit some issues with the database.

My first plan was to download the database and open it in SQLite, but having to re-establish my ADB debug session each time I downoaded the file got annoying, so I decided to write a short snippet which dumps the name & CREATE TABLE schema of each table to the log:

SQLiteDatabase db = YourDbHelper.getWritableDatabase();
Log.d(this.getClass().getSimpleName(), db.getPath());
Cursor c = db.rawQuery("SELECT type, name, sql, tbl_name FROM sqlite_master", new String[]{});
c.moveToFirst();
while (!c.isAfterLast()){
        int count = c.getColumnCount();
        Log.d(this.getClass().getSimpleName(), Integer.toString(count));
        Log.d(this.getClass().getSimpleName(), c.getColumnNames().toString());
        for (int i = 0; i < count; i++){
                Log.d(this.getClass().getSimpleName(), c.getColumnName(i));
                Log.d(this.getClass().getSimpleName(), c.getString(i));
        }
        c.moveToNext();
}

This gives decently printed output for a quick and dirty script:

07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: name
07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: projects
07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: sql
07-16 23:44:53.795/io.kyle.dev D/ProjectListActivity: CREATE TABLE `projects` ( `_id` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, `name` TEXT, `avatar` TEXT, `short_description` TEXT, `description` TEXT)

No Comments

Notes from various AWS Investigations

  • AWS CloudWatch Logs storage charge == S3 storage charge. Possibly less, since the logs are gziped level 6 first.
  • CW Logs makes more sense than using AWS Elasticsearch at small scale – prices start at 1.8c an hour + EBS charges vs 50c/GB of log ingestion + storage
  • For pure log storage & bulk retrival, S3 makes far more sense than either ElasticSearch or CloudWatch Logs. B2 is ~20% of S3 though, so they make even more sense.

  • DynamoDB streams are for watching what happens to a table, and they rotate every ~24 hours, so you’d get charged on a rolling basis, and can’t delete individual events. I’m assuming events don’t disappear once you’ve processed them.

  • Cert Manager is in more zones! But only makes a difference if you hang stuff in front of an ELB. Certs for CloudFront have to still go through us-east-1.
  • API Gateway has direct integration with DynamoDB, doing an end run around Lambda functions that just insert & retrieve records (aws.amazon.com/blogs/compute/using-amazon-api-gateway-as-a-proxy-for-dynamodb/) Amusingly, models continue to not be used. (I still don’t understand what Models are supposed to do/enforce)
  • DynamoDB cross-region replication is weird. You spin up an EC2 instance that handles it for you. I wonder if the DynamoDB team will work on managed replication…
  • DynamoDB is stupid cheap, and it makes sense for me to migrate the vast majority of my DB centric stuff to it.
  • CloudFront has a weird “$0.000 per request – HTTP or HTTPS under the global monthly free tier” for requests, and I’m not sure why. My account is long out of the free tier.

No Comments

Checking a SSL certificate’s expiry date with Python

Before I found the --keep-until-expiring option in the Let’s Encrypt command line client, I was thinking I’d have to parse the cert, extract the expiry date, then check it against the current date before returning True or False.

Thankfully I found the much easier option, but I decided to post the code I wrote to read the date just in case I need something like it in the future. Read the rest of this entry »

,

1 Comment

Cloudflare & Python’s urllib

TL;DR: Trying to diagnose why my copr builds were abruptly failing, I found an interesting thing: Cloudflare’s Browser Integrity Check apparently doesn’t like Python’s urllib sending requests.

The symptoms in Copr were weird: builds would try importing, and then fail with no log output. To me, trying to diagnose the problem made no sense – I could download the file just fine through Chrome, Firefox and wget, so I thought the issue was with copr. I was all set to file a bug with them, when I decided to look at what other people were building and see if URL importing worked for them.

Surprise surprise, it did. This clearly meant it was isolated to me and my server, so I held off on the bug filing until I could find out more. On my stuff things accessing Jenkins worked fine, so I tried other tools. Hurl.it said there were no issues accessing Jenkins, while RequestBin said… not much at all, I couldn’t get it to work with copr’s URL detection – copr wants the path to end with src.rpm, requestbin uses the path to determine which bin to route stuff to.

Next I looked at Jenkins behind an Nginx proxy. I found an extra Nginx option needed to be set to allow Jenkins CSRF to work properly, which was a plus. (I also found something on serving static files through nginx instead of jenkins, which is now part of my todo list.) I ran through steps on DO’s setting up Jenkins behind nginx tutorial just in case I had missed something, everything looked fine.

Without logs, I couldn’t really trace down what was happening, so I looked at using Github to host the SRPMs, since Github clearly worked. First thing I realized was that keeping 3 branches in an attempt to reduce the number of repos I spawned was a bad idea, and I should fix that. (and maybe follow the git flow strategy…) GitHub has a releases API, so I’ll be spending some time in the near future with it (also, streaming uploads for the 100MB+ pagespeed releases…) Or try the github-release script that someone wrote. Either way, I’d had enough for the night, and stopped.

Today I picked it back up, and took another look. Still thinking it was my server, I tried to isolate which component was failing by serving the src.rpm file on a different domain. Instead of downloading the file and sticking it in a folder, I symlinked a folder to the Jenkins’ workspace directory, and submitted a build. Surprisingly, it worked, so I took a look at my nginx logfile, which revealed that the user agent of what downloaded it was Python-urllib/1.17. That looked like something from the Python stdlib that I could try to replicate the issue with. Started up ipython, googled for python urrlib downfile, and found the urlretrieve function. Tried it against the alternate src.rpm URL, and it worked. Tried it against Jenkins directly, and it failed.

Now I could replicate the symptoms! Next step was to google “Cloudflare blocking urllib”, which suggested that the (lack of) headers was tripping Cloudflare’s Browser Integrity Check. Toggled it off in Cloudflare, and sure enough I could now get the src.rpm file through urllib.

Deciding I now had enough info to file a bug, I went to the Copr codebase to find the relevant line. Downloaded the latest release, unzipped it to grep the code, found the file and line, went to grab the line from the Git repo – and couldn’t find the line. Turns out between last week and this week, the bug was fixed, just not pushed to Copr.

So for now I’ll be regenerating all the missed versions of nginx with the integrity check disabled. Fingers crossed there’s nothing else wrong with Copr in the meantime…

 

 

, , ,

No Comments

Path to building Nginx Mainline RPMs for Fedora & CentOS

Or: How I spent an afternoon doing a deep dive into the RPM spec and solving a problem for myself

tl;dr – Nginx Mainline packages are being built for Fedora & CentOS at copr.fedoraproject.org/coprs/kyl191/nginx-mainline/


My webserver’s running nginx 1.4.7, a version that hasn’t gotten non-bugfix attention since March 2013, according to the changelog. Oddly enough for a Fedora package, the version in koji is the stable branch – something that makes sense for CentOS/EPEL since that’s a long term support release, but not for Fedora. Doubly annoying was the fact that the packages for Fedora 20 (because Fedora 21 isn’t supported on OpenVZ yet) was one step behind the official stable – 1.4 instead of 1.6.
If I wanted mainline on Fedora, I was going to have to build it from source.
Read the rest of this entry »

, ,

1 Comment

Musings on the Mythical Man-Month Chapter 2

tl;dr: Scheduling tasks is hard

  1. We assume everything will go well, but we’re actually crap at estimating
  2. We confuse progress with effort
  3. Because we’re crap at estimating, managers are also crap at estimating
  4. Progress is poorly monitored
  5. If we fall behind, natural reaction is to add more people

Overly optimistic:

Three stages of creation: Idea, implementation, interaction

Ideas are easy, implementation is harder (and interaction is from the end user). But our ideas are flawed.

We approach a task as a monolithic chunk, but in reality it’s many small pieces.

If we use probability and say that we have a 5% chance of issues, we would budget 5% because it’s one monolithic thing.

But the real situation is that each of the small tasks has a 5% probability of being delayed. Thus, 0.05^n

Oh, and our ideas being flawed? Yeah… virtually certainty that the 5% will be used.

Progress vs effort:

Wherein the man-month fallacy is discussed. It comes down to:

  1. Adding people works only if they’re completely independent and autonomous. No interaction means assign a smaller portion, which equates to being done faster
  2. If you can’t split it up tasks, it’s going to take a fixed amount of time. Example in this case is childbearing – you can’t shard the child across multiple mothers. (“Unpartitionable task”)
  3. Partitionable w/ some communication overhead – pretty much the best you can do in Software. You incur a penalty when adding new people (training time!) and a communication overhead (making sure everyone knows what is happening)
  4. Partitionable w/ complex interactions – Significantly greater communication overhead

Estimation issues:

Testing is usually ignored/the common victim of schedule slippage. But it’s frequently the time most needed because you’re finding & fixing issues with your ideas.

Recommended time division is 1/3 planning, 1/6 coding, 1/4 unit tests, 1/4 integration tests & live tests (I modified the naming)
Without checkins, if the schedule slips, people only know towards the end of the schedule, when the product is almost due. This is bad because a) people are preparing for the new thing, and b) the business has invested on getting code out that day (purchasing & spinning up new servers, etc)

Better estimates:

When a schedule slips, we can either extend the schedule, or force stuff to be done to the original timeframe (crunch time!) Like an omelette, devs could increase intensity, but that rarely works out well

It’s common to schedule to an end-user’s desired date, rather than going on historical figures.

Managers need to push back against schedules done this way, going instead for at least somewhat data-based hunches instead of wishes

Fixing a schedule:

Going back to progress vs effort.

You have a delayed job. You could add more people, but that rarely turns out well. Overhead of training new people & communication takes it’s toll, and you end up taking more time than if you just stuck with the original team

Recommended thing is to reschedule; and add sufficient time to make sure that testing can be done.

Comes down to the maximum number of people depends on the number of independent subtasks. You will need time, but you can get away with fewer people.


The central idea, that one can’t substitute people for months is pretty true. I’ve read things that say it takes around 3 months to get fully integrated into a job, and I’ve found that to be largely true for me. (It’s one of my goals for future co-op terms to try and get that lowered.)

The concept of partitioning tasks makes sense, and again it comes back to services. If services are small, 1 or 2 person teams could easily take care of things, and with minimal overhead. When teams start spiraling larger, you have to be better at breaking things down into small tasks, so you can assign them easily, and integrate them (hopefully) easily. It seems a bit random, but source control helps a lot here.

Estimation is tricky for me, and will continue to be, since it only comes from experience – various ‘rules of hand’ that I’ve heard include take the time that you’ll think you’ll need, double it, then double it again.

But it’s a knock-on effect – I estimate badly, tell my manager, he has bad data, so he estimates wrongly as well… I’ve heard stories that managers pad estimates. That makes sense, especially for new people. I know estimates of my tasks have been wildly off. Things that I thought would take days end up taking a morning. Other things like changing a URL expose a recently broken dependency, and then you have to fix that entire thing… yeah. 5 minute fix became afternoon+. One thing which I’ll try to start doing is noting down how ling I expect things to take me, and then compare it at the end to see whether or not I was accurate. Right now it’s a very handwavey “Oh I took longer/shorter than I wanted, hopefully I’ll remember that next time!”

Which, sadly, I usually don’t.

No Comments

Musings on The Mythical Man-Month Chapter 1

Summary of the chapter:

Growing a program

A standalone product, running on the dev’s environment, is cheap.

It gets expensive if:

  1. You make it generic, such that it can be extended by other people. This means you have to document it, testing it (unit tests!), and maintain it.
  2. You make it part of a larger system. For example, cross-platformness. You have to put effort into system integration.

Each of those tasks takes ~3x effort of creating the original program. Therefore, creating a final product takes about ~9x the effort. Suddenly, it doesn’t look simple anymore.

Why Software

  1. Sheer joy of making things. Especially things that you make yourself.
  2. Joy of making things for other people
  3. Fascination at how everything works together
  4. Joy of always learning
  5. Joy at working in an insanely flexible medium – a creative person, but the product of the creativity has a purpose. (Unlike poetry, for example)

In summary, programming is fun because it scratches an itch to design and make something, and that itch is surprisingly common among people.

Why not Software

  1. You have to be perfect. People introduce bugs. You are a person. Therefore you aren’t perfect, and a paradox occurs, which resolves in the program being less than perfect.
  2. Other people tend to dictate the function/objective of the program – leaving the writer with authority insufficient for his responsibility. In other words, you can’t order people around, even though you need stuff from them. Particularly infrastructure people, given programs that aren’t necessarily well working and they’re expected to make them run.
  3. Designing things is fun. Bug fixing isn’t. (This is the actual work part.) Particularly where each successive bug tends to take longer to find & isolate than the last one.
  4. And when you’re done, frequently what you’ve made is ready to be superseded by a new better program. Luckily, that shiny new thing is usually also in gestation, so your program gets put into service. Also, it’s natural – tech is always moving on, unlike your specs, which are generally frozen at a fixed point. The challenge is finding solutions to problems within budget and on time.

So I got ahold of the much talked about Mythical Man-Month book of essays on Software Engineering… and I’ve decided to read an essay a night, and muse about it, after writing a summary of the chapter (read: taking notes about the book, so I’m not just passively reading).

I agree with pretty much everything – and I’ll cover points in order of where they appear in the essay.

Growing a Program: The extra work done in getting systems integrated is pretty accurate. I think that’s driving a lot of the move towards offering everything as services instead of one monolithic thing. Moving to using a service means a lot of stuff is abstracted away for you – you can ignore the internal workings (more so than using a library, which you have to keep track of) in the hope that stuff works as advertised. So you save some time on the integration side of things by reducing the amount of surface area you have to integrate with.

However, the fleshing out of the program – writing tests and documenting everything, is harder to avoid. A lot of the boilerplate stuff is automated away by IDEs (auto generating test stubs, for example), but there’s still work that needs to be done to make stuff into a proper dependable system – and that’s really the stuff that’s separating the small scale, internal software from public scale.

Admittedly, that’s a bit of a tautology. But I think a lot of the growing is just forced by not wanting to keep on fixing bugs in the same code. By having a test against it, you know whether or not at the very least, the expected behaviour occurs.

Why software: I chose software over hardware in Uni because it’s so much more flexible (#5). I like making things (#1), especially those things which help people (#2). I do a mental happy dance every time someone posts a nice comment on Lightroom Plugin page on deviantArt. Though the happiness of understanding how things fit together (#3) is more of “Ha! Got <complicated API> to actually work!” And #4 is more frantically Googling so as to not look like an idiot to the rest of my team.

Why not Software: Uh… yeah. #1 & #3 – damn bugs. See the 6+2 stages of debugging. Sadly true, especially the moment of realisation, followed by how did that ever work. But fixing bugs is satisfying, particularly a new bug that you’ve never seen before. #2 – That’s, well, the nature of work when you’re not at the top. The authority/responsibility trade off is real. I like to think I’ve worked around it at Twitter by following Twitter’s “Bias towards action” guideline – I have submitted fixes for other projects, gotten reviews and submitted code. Much more efficient than filing a bug and saying “BTW, you’re blocking me right now.” And #4 – That goes along with the learning new stuff thing. Also, it’s probably a good thing that a new version will come along soon – you get closer to what the user wants by iterating. If you’ve stopping iterating, the product is either a) perfect, or b) cost/benefit analysis says it’s not worth updating, run it in pure maintenance mode.

Or c) you just really don’t care about it anymore. Which is really just a variation on b.

No Comments

Django + Nginx resources

For SE Hack Day:

michal.karzynski.pl/blog/2013/06/09/django-nginx-gunicorn-virtualenv-supervisor/ looks the best (along with michal.karzynski.pl/blog/2013/07/14/using-redis-as-django-session-store-and-cache-backend/)

wiki.nginx.org/DjangoFastCGI and https://code.djangoproject.com/wiki/DjangoAndNginx are Django + FastCGI

adambard.com/blog/start-to-finish-serving-mysql-backed-django-w/

blog.richard.do/index.php/2013/04/setting-up-nginx-django-uwsgi-a-tutorial-that-actually-works/

serverfault.com/questions/370525/nginxdjango-serving-static-files/370573#370573

stackoverflow.com/questions/17511466/deploying-django-on-nginx

www.digitalocean.com/community/articles/how-to-install-and-configure-django-with-postgres-nginx-and-gunicorn because postgres

No Comments

‘Solving’ SQL injection in Java

So during the summer I worked on a large enterprisey Java program. (Singleton pattern ahoy!)

One of the annoying things (besides massive code duplication) was it used database queries that naively appended user input (particularly search queries) onto selects.

And from my web background, I knew that SQL injection makes wiping the table trivial. Or even dropping the database.

So I wanted to convert as much stuff to a prepared statement as possible.

I started off with this:

Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = stmt.executeQuery("select * from view where date = ' "+ inputBox.toString() + " ' ");
if (rs.first()) {

Trial and error get me this:

PreparedStatement test = conn.prepareStatement("select * from viewCalendar where date = ?");
test.setDate(1, new java.sql.Date(inputBox.toString())); 
ResultSet rs = test.executeQuery(); if (rs.next()) {

The hardest part was replacing the functionality of rs.first() – the rest of the code requires a ResultSet that’s started at the first row, but the ResultSet returned by the prepared statement wasn’t. But the Java API docs had the solution – next() “moves the cursor forward one row from its current position. A ResultSet cursor is initially positioned before the first row; the first call to the method next makes the first row the current row.”

Doing this also makes the code cleaner – instead of having

if (rs.first()){
  do {
    //bunch of stuff
  } while (rs.next());
}

The equivalent becomes

while (rs.next()){
    // Bunch of stuff
}

which I consider a whole lot cleaner & easier to read.

So I got my fix – my naive selects were now using preparedStatements (also, possibly JDBC/MSSQL execution path caching?), hence the solved bit in the title.

However, searches are still an issue – due to the way the multiple criteria used resulted in a variable number of parameters, and PreparedStatement not supporting variable numbers of parameters, I didn’t see an alternative to assembling a string, even if I modified the logic to insert nulls – because that would just cause the database to return no rows, because the columns don’t have NULL in the rows that I would want.

Hence the quotes around solved. 🙁

===

And a bonus: When assembling a variable parameter where string, don’t use

if (where.isEmpty()) where = string;
else where += " or " + string;

For one or two parameters it’s ok. But for 20+ parameters, you’re going to have a stupid amount of if/else blocks. I used an ArrayList (variable length!) of type string, and then had a method called buildWhere(ArrayList<String>) that builds a string parameter by parameter with ORs in the appropriate places.

,

No Comments

I broke… Java?

A... C error. In Java.  Huh.

A… C error. In Java.
Huh.

Something from my summer job. I found it horribly amusing.

===

And then I fixed it.

,

No Comments