Dr. Lawlor's Code, Robots, & Things

April 26, 2016

Terrain Rendering in 3D

Filed under: Graphics, Linux, Programming, Random Thoughts — Dr. Lawlor @ 4:49 pm

Back in 2003 I worked on a terrain simplification algorithm, a modification of Lindstrom’s method, to allow interactive exploration of big terrains in 3D.

A binary distribution is available for Windows (.zip) or Linux (.tar.gz), and the binaries should work, and let you view your own binary DEM and JPEG texture.  I made an attempt to include the source in there, although I built both of the binaries using my custom build system, so it’s likely to take quite a bit more work, and possibly even some missing custom libraries, to get it to compile.

September 15, 2015

Rescuing Data from a Banished Daemon

Filed under: Linux, Programming — Dr. Lawlor @ 9:12 pm

Last week we managed to accidentally delete the entire directory housing our custom database server (superstar), including the on-disk backups and historical backups of its robot info database.  Not good.

Amazingly, ps showed the server was still running, and poking around we could even see the in-memory copy of the database was still fine, but it was trapped inside this banished daemon.

# ps
...
no_priv 5149 0.1 0.0 14932 1640 ? S 18:04 0:08 ./superstar

Checking /proc/5149 shows the directory and exe have been deleted, which make a lot of tools break:

# ls -al /proc/5149
...
lrwxrwxrwx 1 no_priv no_priv 0 Sep 11 18:09 cwd -> /home/itest/git3/superstar (deleted)
-r-------- 1 no_priv no_priv 0 Sep 11 18:20 environ
lrwxrwxrwx 1 no_priv no_priv 0 Sep 11 18:09 exe -> /home/itest/git3/superstar/superstar (deleted)

We tried for a while to recreate or change the cwd, so we could tickle the daemon into backing up its database somewhere accessible–relinking won’t seem to change this directory, although possibly inode-level reconstruction of the old directory could have worked.

But I could still copy off the running executable for analysis:

# cp /proc/5149/exe /tmp/necrodaemon
# nm /tmp/necrodaemon | grep back
000000000061ca40 b _ZL15backup_filename

This is the in-memory address of the string holding the filename where the daemon tries to back up its in-memory database.  Currently it reads “db.bak”, a relative path, which is a problem since the process’s current working directory is gone.

Now just attach to the daemon in gdb, and we can see and change what’s inside it.

# gdb -p 5149

Of course, the daemon wasn’t compiled with debug symbols, so we can’t call methods or even functions.  So step one is verifying there’s anything actually at that “backup_filename” memory address–it’s a std::string, which has a single pointer to a separate refcounted object that actually holds the bytes of the string.  We then overwrote the bytes of this string to make the relative path into an absolute path, “/a.bak”, a file which we’d previously created and given the correct permissions.

(gdb) p *(long *)0x61ca40
$1 = 27717720
(gdb) p *(char **)0x61ca40
$7 = 0x1a6f058 "db.bak"
(gdb) set {char}0x1a6f058 = '/'
(gdb) p *(char **)0x61ca40
$8 = 0x1a6f058 "/b.bak"
(gdb) set {char}0x1a6f059 = 'a'
(gdb) p *(char **)0x61ca40
$9 = 0x1a6f058 "/a.bak"
(gdb) detach
Detaching from program: , process 5149
(gdb) quit

As soon as the backup ran, the daemon had written its in-memory database to the newly created file “/a.bak”, so our data was saved–daemon necromancy to the rescue!

August 12, 2015

Parrot “Rolling Spider” UAV Hacking: Dumping the Filesystem

Filed under: Hardware, Linux, Programming, Rolling Spider — Dr. Lawlor @ 11:53 pm

I just got a new tiny UAV, the Parrot “Rolling Spider” ($80), which is very fun to fly via bluetooth with my phone.  But it’s also a linux-based computer, so it’s also fun to hack!

The easiest way to get a root shell is to just plug it in via the USB cable, which not only shows up as a removable USB drive, it also shows up as a network device (at least, as of the 1.99.2 firmware version).  This means you can immediately get a root shell with:

telnet 192.168.2.1

That was easy!  Now to dump the filesystem, to netcat on port 1234.  (The ^p avoids /proc, which has infinite recursive root links; the ^y avoids /sys, which has files that change in size.)

tar cpf - [^p][^y]* | nc -l -p 1234

To get the filesystem as a file on your desktop computer, now just:

nc 192.168.2.1 1234 > rootfs.tar

This has *everything*, from:

drwxr-xr-x root/root 0 1969-12-31 14:00 bin/
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/getopt -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/dd -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/cp -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/df -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/ip -> busybox
-rwxrwxr-x root/root 35 1969-12-31 14:00 bin/kk
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/ln -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/ls -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/mv -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/ps -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/rm -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/sh -> busybox
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/vi -> busybox
-rwxrwxr-x root/root 305 1969-12-31 14:00 bin/blink_led_orangeleft.sh
lrwxrwxrwx root/root 0 1969-12-31 14:00 bin/ash -> busybox

through:

drwxr-xr-x root/root 0 1969-12-31 14:00 usr/share/avahi/
-rw-r--r-- root/root 560 1969-12-31 14:00 usr/share/avahi/avahi-service.dtd
-rw-r--r-- root/root 5104 1969-12-31 14:00 usr/share/avahi/service-types
drwxr-xr-x root/root 0 1969-12-31 14:00 var/
lrwxrwxrwx root/root 0 1969-12-31 14:00 var/log -> /tmp/
-rw-rw-r-- root/root 7 1969-12-31 14:00 version.txt
drwxr-xr-x root/root 0 1969-12-31 14:00 www/
-rw-rw-r-- root/root 485 1969-12-31 14:00 www/index.html

For example, now I can see the contents of the control shell scripts:

$ cat bin/set_led_greenleft.sh 
#!/bin/sh


# temp behaviour : red light right on
gpio 33 -d ho 1
# temp behaviour : red light left off
gpio 30 -d ho 0

#green light right off
gpio 31 -d ho 0

#green light left on
gpio 32 -d ho 1

I can also see the details of how the code was built:

$ file bin/busybox
busybox: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.16, stripped

Of course, eventually I’ll want to permanently modify this filesystem, by re-flashing the UAV with a reverse engineered PLF firmware file, which is similar to the Parrot AR Drone PLF format.  I’m nearly there with “plftool -e raw -i rollingspider_update.plf -o .”, but each resulting file has the filename prepended in some sort of fixed-length binary header.

Stay tuned!

March 16, 2014

Arduino millis() jumps by 2 every 43 milliseconds

Filed under: Arduino, Programming — Dr. Lawlor @ 1:07 am

I’ve been doing some software-modulated infrared light detection work, and for the signal to be sent correctly I need millisecond-accurate timings.  I realize professionals do this sort of thing with interrupts, but I hate debugging interrupt code, and I’m always afraid there’s some rare timing glitch that will kill the entire project at the worst possible moment.

I was using Arduino’s millis() function as my time base, and I kept getting weird screwy timing every 43 milliseconds.  I assumed it was my code, which does a bunch of other stuff like serial communication, but the glitch didn’t change regardless of what I did.  Turns out, the Arduino Uno’s oscillator runs at a power of two rate, 1.024 milliseconds per overflow, so every 1/0.024 = 41.666 milliseconds, it’s a full millisecond off.  Arduino wiring.c fixes this by keeping track of the fractional milliseconds, and adds a sort of “leap millisecond” every 43 milliseconds to keep things in sync.  This results in millis() instantly jumping up by more than one.

Here’s an example Arduino Uno sketch that demonstrates these timing jumps, and shows two fixes.

/* Demonstrate timing gaps in the millis() function */
void setup(void) {
 Serial.begin(57600); // to report the horror
}

extern "C" volatile unsigned long timer0_overflow_count; // from wiring.

unsigned long last=0; // last value of millis()
void loop(void) {
 unsigned long next;
 do { // watch the timer until it changes
   next=millis(); // jumps by 2 every 43 ms
   //Two possible fixes:
   // next=micros()/1000; // slow, but works
   // cli(); next=timer0_overflow_count; sei(); // faster, but 1024 microseconds/tick
 } while (next==last);

 int delta=next-last;
 if (delta!=1) {
   Serial.print(delta);
   Serial.print(" jump at ");
   Serial.println(next);
 }

 if (last%10000==0) {
   Serial.print("Still running at ");
   Serial.println(last);
 }
 last=next;
}

On my Arduino Uno, the millis() version of this reports regular timing glitches:

Still running at 0
2 jump at 43
2 jump at 86
2 jump at 128
2 jump at 171
2 jump at 214
2 jump at 256
2 jump at 299
2 jump at 342
...

The easy way to avoid these timing jumps is never to use millis() at all!  It works to just use micros()/1000, although looking at wiring.c’s implementation of delay(), you need to be a bit careful about when micros() overflows every hour.  You can also directly read the raw timer0_overflow_count as shown above, although the actual speed this increments depends on the Arduino clock rate, and you need to do cli/sei around the read to avoid jumps of +-256, as the interrupt function updates the two bytes of that value.  Both fixes never have leap milliseconds, and result in dependable timing.

January 11, 2014

CPU Performance Counters with “likwid”

Filed under: C++11, Programming — Dr. Lawlor @ 12:01 am

I’m having fun playing with a little command line performance counter toolkit with the silly name of “likwid” which stands for “Like I know what I’m doing!”  If you haven’t used performance counters while programming, they can very quickly tell you surprising things about your code’s performance, like the fact that you’re getting clobbered by cache misses, not floating point arithmetic.

To get the toolkit running on a recent Ubuntu linux machine (tested on 13.10):

(more…)

April 29, 2013

Chrome JavaScript InvalidStateError: DOM Exception 11

Filed under: Programming — Dr. Lawlor @ 2:35 pm

There’s an annoyingly cryptic error message in Chrome when your JavaScript code does an XMLHttpRequest incorrectly–“Uncaught Error: InvalidStateError: DOM Exception 11”.  This error shows up when you access the request’s “status” inside onreadystatechange prior to hitting readyState 4.  Firefox doesn’t complain about this, and Chrome only seems to care if you’re doing it inside onreadystatechange.

For example, this works fine:

var req=new XMLHttpRequest();
req.onreadystatechange=function()
{
  if (req.readyState==4 && req.status==200)
  {
    document.getElementById("div1").innerHTML=req.responseText;
  }
};
req.open("GET","/",true);
req.send();

But interchanging the order of the req.readyState and req.status checks results in “Uncaught Error: InvalidStateError: DOM Exception 11”.

FYI!

December 30, 2012

C++11 Performance Analysis

Filed under: C++11, Programming — Dr. Lawlor @ 1:29 pm

My favorite new language is C++11, also known as C++Ox.  Contrary to popular belief, compiled languages aren’t dead, and in fact they’re more important than ever: the two big problems with the popular runtime typed, interpreted languages (JavaScript, Perl, Python, PHP, Ruby, Scheme, etc) are (1) speed and (2) robustness.  The speed thing has been getting better as people build dynamic translation engines, but the robustness problem is harder to fix.  In interpreted languages, developers don’t get compile errors; instead your *users* get runtime errors.  This is really the wrong time and place to find out you’ve passed two arguments to a function taking three arguments, or a string where you should have a number or object.

Anyway, C++11 has lots of cool new features, mostly aiming for more expressive code.  As a performance-oriented programmer, I always feel the need to understand the cross-platform delivered performance of any new feature before I use it very much.  So I worked my way through Alex Sinyakov’s excellent slideshow summarizing many of the new features, and built quite a few performance tests.  The platforms I tested are:

  • g++ 4.7.2, on Linux.  Generally, g++ had faster allocations, and has a little more of the language implemented.  Note that 4.7.0, from this March, is missing support for delegated constructors and literal operators, but these are supported in the most recent 4.7.2.
  • clang 3.3, on Linux. Generally this performed almost identically to g++, possibly since it’s using the same libstdc++ headers.  You want the latest version from SVN, since the 3.0 that ships with Ubuntu 12.10 doesn’t support initializer lists, seems to segfault on various simple lambdas, and doesn’t link with libstdc++.
  • Microsoft Visual Studio 2012, Release Mode, on Windows 8.  Generally, VS was better at automatically inlining functions.  But it’s missing initializer lists, delegated constructors, raw string literals, variadic templates, and a few other pieces.  Error messages for templated code are totally useless, not even giving you the types being instantiated.

I am really impressed at how much of C++11 is already working, and working the same way on these three modern compilers.  It’s also amazing how many cool language features have *zero* runtime performance cost: they’re inlined and optimized away at compile time.  The performance of most pieces of code seems to boil down to one question: is there memory allocation?  If so, the function is really slow, taking tens of nanoseconds on either platform.  If not, the function is fast, taking only a couple of nanoseconds.  This means std::string and std::vector are *much* slower to create than character buffers and arrays, although accessing them is the same speed.

  • auto, nullptr, <type_traits>, static_assert, and even <tuple> work on both platforms, and have zero runtime performance cost.  Range-based for is exactly as fast as a loop across integer indices.  for_each is exactly as fast as a loop between iterators.  Returning a tuple from an actual (non-inlined) function call is only a fraction of a nanosecond more expensive than returning a native type, and exactly the same cost as returning a custom struct.
  • std::function is weirdly slow to create (10-20ns), and costs 1-2 extra function call overheads (4-6ns) to execute.  This is despite lambdas and std::bind being entirely inlined and cost-free if you put the result into an “auto”, but slow if put into a std::function.
  • unique_ptr is nearly the same cost as a bare pointer (30-60ns), but much easier to make correct.  shared_ptr costs over twice as much (60-140ns) if you use new, probably because there’s a separate memory allocation for the reference count.  Chris Hartman pointed out “make_shared” not only reduces code duplication, it can merge the refcount allocation, and it is indeed almost as fast as unique_ptr.
  • <random> makes it cheap to create random number engines and distributions, and accessing them is about as fast (10-20ns) as calling the old rand.

I’ve only begun to benchmark the containers, but there are a few interesting results.  Ordered from slowest to fastest access times, reported as write time per integer for a new container with 100 integers:

  •  Bare [] arrays are the fastest of all, 0.3-0.4ns
  • <array> is pretty good, 0.4-0.5ns
  • <vector> is OK, 2.7ns if you’ve pre-reserved, 8-13ns if you just push_back and let it reallocate and copy.  In either case writes are far slower than any type of array, which I blame mostly on the dynamic memory allocation required by <vector> compared to the fixed-size stack allocation of <array>.
  • <deque> is decent on gcc at 4ns, slow on Visual at 30ns(!) per element.  I’ve heard gcc does block allocation to make deque faster.
  • <list> and <forward_list> are 30-90ns per element, and clearly do one allocation per element.
  • <map> and <unordered_map> are 60-130ns per element.  Currently <unordered_map> isn’t very compelling, at least for small maps up to a few thousand elements.  In my cursory tests, writes are about the same speed as the old <map>.  Unordered reads are 3x faster than <map> reads under gcc, but nearly 2x slower under Visual.

The fact that storing an integer to a <map> is 200-fold slower (20000% slower!) than storing the same integer to an array[] seems pretty surprising to me.  Too bad the STL allocator interface isn’t easy to improve. Maybe this is decade-old news to everybody else!  Note that reads aren’t nearly as bad, probably because there’s no allocation doing a read.

See the detailed numerical results and source code here.  All of these were run on my Ivy Bridge i7-3632QM laptop (quad core, 2.2-3.1Ghz), though spot tests on my Sandy Bridge i5-2400 desktop runs at roughly the same speed.

Create a free website or blog at WordPress.com.