Archive for April, 2014

Fixing a mangled NTFS partition: success

A follow-up from

Almost two years on,

Correcting errors in the Master File Table (MFT) mirror.
Correcting errors in the master file table's (MFT) BITMAP attribute.
Correcting errors in the Volume Bitmap.
Windows has made corrections to the file system.

My drive is back! And seemingly OK! I’m celebrating by setting up a Python script to recursively run through my current photo backup and the drive and compare file checksums.

How I did it

The key thing was to isolate the drive and not use it. If I hadn’t done that, it would have been utterly unrecoverable.

I also remembered the layout of the drive: Exactly 500GB on the first partition, and the rest of the disk as an NTFS partition. The first partition was a member of a RAID5 set, which had LVM setup on top of it.

I had used GParted to extend the NTFS partition backwards, to get the extra 500GB. However, this failed for… some reason. I’m not too clear.

TestDisk wasn’t successful – It identified the LVM partition, then promptly skipped everything between the LVM header and where the LVM header said the partition ended. Which meant it skipped to the middle of the drive, since the RAID5 set was 1TB in size. And thus TestDisk refused to restore the partition, because it doesn’t make sense that a 2 TB drive has 2 partitions which take up more than that.

The harddisk (2000 GB / 1863 GiB) seems too small! (< 3463 GB / 3226 GiB)
     Linux LVM                0  65  2 130541 139 30 2097145856
     LVM2, 1073 GB / 999 GiB
     HPFS - NTFS          65270 246  1 243201  13 12 2858446848
     NTFS found using backup sector, blocksize=4096, 1463 GB / 1363 GiB

Having tried a bunch of methods in the 1.5 years+ and failing each time, I decided to finally go all the way and wipe out the (screwed up) partition table and recreate it. I didn’t know the original commands run to create the partition setup, so I ended up booting into a Fedora 13 Live image, and doing the ‘Install to Hard Disk’ option, selecting the disk as the install target. I was worried because it wouldn’t allow me to create a partition without also formatting it (Kind of makes sense…), so I terminated the install process after seeing “Formatting /dev/sdd1 as ext4” – in other words, the first partition was being formatted.

I then turned to fdisk to create the partition, selecting the defaults which should have extended the partition to the end of the disk. However, there was some disagreement on what consistuted the end of the disk, leaving me with ~2MB of unallocated space. When I created the partition in Windows, it went all the way to the end of the disk. What this meant is that I ended up with a sector mismatch count (along the lines of “Error: size boot_sector 2858446848 > partition 2858446017”).

So I had a semi-working drive, just with a number screwed up. And what edits numbers on a drive? A hex editor, that’s what. So it was off to edit the NTFS boot sector, and the MBR. I had correct looking numbers from TestDisk’s analysis, so I plugged those in, and since I had the hex editor opened, I wiped out the LVM header at the same time.

Turned out wiping the LVM header was an excellent idea, because TestDisk then found the NTFS boot sector, and allowed me to restore it:

     HPFS - NTFS          65270 246  1 243201  13 13 2858446849
     NTFS, blocksize=4096, 1463 GB / 1363 GiB

After that, the disk still wouldn’t mount in Windows, but chkdsk at least picked it up. After letting chkdsk run overnight, I got my drive from August 2012 back, with (as far as I can tell) no data loss whatsoever.

That’s worth an awwyeah.

, ,

No Comments

Musings on the Mythical Man-Month Chapter 2

tl;dr: Scheduling tasks is hard

  1. We assume everything will go well, but we’re actually crap at estimating
  2. We confuse progress with effort
  3. Because we’re crap at estimating, managers are also crap at estimating
  4. Progress is poorly monitored
  5. If we fall behind, natural reaction is to add more people

Overly optimistic:

Three stages of creation: Idea, implementation, interaction

Ideas are easy, implementation is harder (and interaction is from the end user). But our ideas are flawed.

We approach a task as a monolithic chunk, but in reality it’s many small pieces.

If we use probability and say that we have a 5% chance of issues, we would budget 5% because it’s one monolithic thing.

But the real situation is that each of the small tasks has a 5% probability of being delayed. Thus, 0.05^n

Oh, and our ideas being flawed? Yeah… virtually certainty that the 5% will be used.

Progress vs effort:

Wherein the man-month fallacy is discussed. It comes down to:

  1. Adding people works only if they’re completely independent and autonomous. No interaction means assign a smaller portion, which equates to being done faster
  2. If you can’t split it up tasks, it’s going to take a fixed amount of time. Example in this case is childbearing – you can’t shard the child across multiple mothers. (“Unpartitionable task”)
  3. Partitionable w/ some communication overhead – pretty much the best you can do in Software. You incur a penalty when adding new people (training time!) and a communication overhead (making sure everyone knows what is happening)
  4. Partitionable w/ complex interactions – Significantly greater communication overhead

Estimation issues:

Testing is usually ignored/the common victim of schedule slippage. But it’s frequently the time most needed because you’re finding & fixing issues with your ideas.

Recommended time division is 1/3 planning, 1/6 coding, 1/4 unit tests, 1/4 integration tests & live tests (I modified the naming)
Without checkins, if the schedule slips, people only know towards the end of the schedule, when the product is almost due. This is bad because a) people are preparing for the new thing, and b) the business has invested on getting code out that day (purchasing & spinning up new servers, etc)

Better estimates:

When a schedule slips, we can either extend the schedule, or force stuff to be done to the original timeframe (crunch time!) Like an omelette, devs could increase intensity, but that rarely works out well

It’s common to schedule to an end-user’s desired date, rather than going on historical figures.

Managers need to push back against schedules done this way, going instead for at least somewhat data-based hunches instead of wishes

Fixing a schedule:

Going back to progress vs effort.

You have a delayed job. You could add more people, but that rarely turns out well. Overhead of training new people & communication takes it’s toll, and you end up taking more time than if you just stuck with the original team

Recommended thing is to reschedule; and add sufficient time to make sure that testing can be done.

Comes down to the maximum number of people depends on the number of independent subtasks. You will need time, but you can get away with fewer people.

The central idea, that one can’t substitute people for months is pretty true. I’ve read things that say it takes around 3 months to get fully integrated into a job, and I’ve found that to be largely true for me. (It’s one of my goals for future co-op terms to try and get that lowered.)

The concept of partitioning tasks makes sense, and again it comes back to services. If services are small, 1 or 2 person teams could easily take care of things, and with minimal overhead. When teams start spiraling larger, you have to be better at breaking things down into small tasks, so you can assign them easily, and integrate them (hopefully) easily. It seems a bit random, but source control helps a lot here.

Estimation is tricky for me, and will continue to be, since it only comes from experience – various ‘rules of hand’ that I’ve heard include take the time that you’ll think you’ll need, double it, then double it again.

But it’s a knock-on effect – I estimate badly, tell my manager, he has bad data, so he estimates wrongly as well… I’ve heard stories that managers pad estimates. That makes sense, especially for new people. I know estimates of my tasks have been wildly off. Things that I thought would take days end up taking a morning. Other things like changing a URL expose a recently broken dependency, and then you have to fix that entire thing… yeah. 5 minute fix became afternoon+. One thing which I’ll try to start doing is noting down how ling I expect things to take me, and then compare it at the end to see whether or not I was accurate. Right now it’s a very handwavey “Oh I took longer/shorter than I wanted, hopefully I’ll remember that next time!”

Which, sadly, I usually don’t.

No Comments

Musings on The Mythical Man-Month Chapter 1

Summary of the chapter:

Growing a program

A standalone product, running on the dev’s environment, is cheap.

It gets expensive if:

  1. You make it generic, such that it can be extended by other people. This means you have to document it, testing it (unit tests!), and maintain it.
  2. You make it part of a larger system. For example, cross-platformness. You have to put effort into system integration.

Each of those tasks takes ~3x effort of creating the original program. Therefore, creating a final product takes about ~9x the effort. Suddenly, it doesn’t look simple anymore.

Why Software

  1. Sheer joy of making things. Especially things that you make yourself.
  2. Joy of making things for other people
  3. Fascination at how everything works together
  4. Joy of always learning
  5. Joy at working in an insanely flexible medium – a creative person, but the product of the creativity has a purpose. (Unlike poetry, for example)

In summary, programming is fun because it scratches an itch to design and make something, and that itch is surprisingly common among people.

Why not Software

  1. You have to be perfect. People introduce bugs. You are a person. Therefore you aren’t perfect, and a paradox occurs, which resolves in the program being less than perfect.
  2. Other people tend to dictate the function/objective of the program – leaving the writer with authority insufficient for his responsibility. In other words, you can’t order people around, even though you need stuff from them. Particularly infrastructure people, given programs that aren’t necessarily well working and they’re expected to make them run.
  3. Designing things is fun. Bug fixing isn’t. (This is the actual work part.) Particularly where each successive bug tends to take longer to find & isolate than the last one.
  4. And when you’re done, frequently what you’ve made is ready to be superseded by a new better program. Luckily, that shiny new thing is usually also in gestation, so your program gets put into service. Also, it’s natural – tech is always moving on, unlike your specs, which are generally frozen at a fixed point. The challenge is finding solutions to problems within budget and on time.

So I got ahold of the much talked about Mythical Man-Month book of essays on Software Engineering… and I’ve decided to read an essay a night, and muse about it, after writing a summary of the chapter (read: taking notes about the book, so I’m not just passively reading).

I agree with pretty much everything – and I’ll cover points in order of where they appear in the essay.

Growing a Program: The extra work done in getting systems integrated is pretty accurate. I think that’s driving a lot of the move towards offering everything as services instead of one monolithic thing. Moving to using a service means a lot of stuff is abstracted away for you – you can ignore the internal workings (more so than using a library, which you have to keep track of) in the hope that stuff works as advertised. So you save some time on the integration side of things by reducing the amount of surface area you have to integrate with.

However, the fleshing out of the program – writing tests and documenting everything, is harder to avoid. A lot of the boilerplate stuff is automated away by IDEs (auto generating test stubs, for example), but there’s still work that needs to be done to make stuff into a proper dependable system – and that’s really the stuff that’s separating the small scale, internal software from public scale.

Admittedly, that’s a bit of a tautology. But I think a lot of the growing is just forced by not wanting to keep on fixing bugs in the same code. By having a test against it, you know whether or not at the very least, the expected behaviour occurs.

Why software: I chose software over hardware in Uni because it’s so much more flexible (#5). I like making things (#1), especially those things which help people (#2). I do a mental happy dance every time someone posts a nice comment on Lightroom Plugin page on deviantArt. Though the happiness of understanding how things fit together (#3) is more of “Ha! Got <complicated API> to actually work!” And #4 is more frantically Googling so as to not look like an idiot to the rest of my team.

Why not Software: Uh… yeah. #1 & #3 – damn bugs. See the 6+2 stages of debugging. Sadly true, especially the moment of realisation, followed by how did that ever work. But fixing bugs is satisfying, particularly a new bug that you’ve never seen before. #2 – That’s, well, the nature of work when you’re not at the top. The authority/responsibility trade off is real. I like to think I’ve worked around it at Twitter by following Twitter’s “Bias towards action” guideline – I have submitted fixes for other projects, gotten reviews and submitted code. Much more efficient than filing a bug and saying “BTW, you’re blocking me right now.” And #4 – That goes along with the learning new stuff thing. Also, it’s probably a good thing that a new version will come along soon – you get closer to what the user wants by iterating. If you’ve stopping iterating, the product is either a) perfect, or b) cost/benefit analysis says it’s not worth updating, run it in pure maintenance mode.

Or c) you just really don’t care about it anymore. Which is really just a variation on b.

No Comments