Programming
Multiple VCS Updates and Cleanups
I spend a lot of time updating a variety of different repositories of different varieties and denominations, and I hate having to do that all by hand - I’d rather just go up into a top-level directory and say update-all and let a script work out what to do, no matter what different repos are there.
I do it with a function defined within my bash profile/rc scripts, and it covers git, bzr, svn, bk, and cvs. The trick is to identify what type of directory we are updating. I do this, lazily, for each type individually, rather than for each directory, but I’ve found this method to be more reliable.
update-all ()
{
for file in `ls -d */.svn 2>/dev/null`;
do
realdir=`echo $file|cut -d/ -f1`;
echo Updating in $realdir;
( cd $realdir;
svn update );
done;
for file in `ls -d */.bzr 2>/dev/null`;
do
realdir=`echo $file|cut -d/ -f1`;
echo Updating in $realdir;
( cd $realdir;
bzr pull );
done;
for file in `ls -d */.git 2>/dev/null`;
do
realdir=`echo $file|cut -d/ -f1`;
echo Updating in $realdir;
( cd $realdir;
git pull );
done;
for file in `ls -d */CVS 2>/dev/null`;
do
realdir=`echo $file|cut -d/ -f1`;
echo Updating in $realdir;
( cd $realdir;
cvs up );
done;
for file in `ls -d */BitKeeper 2>/dev/null`;
do
realdir=`echo $file|cut -d/ -f1`;
echo Updating in $realdir;
( cd $realdir;
bk pull );
done;
unset realdir
}
That’s it - a quick way to update any directory of repos.
Feeding Query Analyzer from DTrace
One of the new features in the new release of MySQL Enterprise Monitor is Query Analyzer. As the name suggests, the Query Analyzer provides information about the queries that are running on your server, the response times and row and byte statistics. The information provided is great, and it doesn’t take very long to see from the query data supplied that there are places where you could improve the the query, or even reduce the number of queries that you submit.
The system works by using the functionality of the MySQL Proxy to monitor the queries being executed and then provide that information up to the MySQL Enterprise Service Manager so that the information can be displayed within the Query Analyzer page. To get the queries monitored, you have to send the queries through the agent which both monitors their execution and sends the information on up to the Manager, along with all the other data being monitored.
The team, though, have been a bit clever and opened up the system to allow information to be sent to the Manager using a REST interface. This means that any system capable of providing information that you want to monitor can be sent up to the Manager. Of course, you can’t just send anything, the Manager needs to know how to handle it, but it shows the flexibility of the design and the potential for the future.
So how does this help us?
Well, one of the new features in MySQL 6.0 that I’ve been working on (with Mikael Ronstrom and Alexey Kopytov) is DTrace probes. We’ve added a bunch of static DTrace probes into MySQL 6.0 (the full set will appear in MySQL 6.0.8, I think) designed to let you monitor the execution of queries within the server. The probes will allow you to see both the top-level information, such as overall execution time, but also deeper so that you can get information about individual row operations, whether the query used the query cache, and whether it used a filesort operation.
I haven’t finished the DTrace probes documentation yet, but I have been demonstrating the probes at conferences and talks (including my MySQL on OpenSolaris university session this week). Trust me, you’ll be pleased. I’ve got a separate blog post detailing some of the specifics in the works at the moment.
For obvious reasons, there’s a synergy here that should be obvious. Why don’t we feed up data extracted using DTrace and provide that up to the Enterprise Manager?
To do this, there are two parts to the process, the DTrace probes and the script hat passes that information up in a suitable format to the manager.
The D script is quite straightforward, we initialize the structures, populate the core information that we need (query string, bytes, rows and the time), and the use the remainder of the probes to finalize that information. Let’s have look at the script and then go through the detail:
#!/usr/sbin/dtrace -s
#pragma D option quiet
mysql*:::query-start
{
self->query = copyinstr(arg0);
self->db = copyinstr(arg2);
self->rows = 0;
self->querystart = timestamp;
self->bytes = 0;
}
mysql*:::select-done
{
self->rows = arg1;
}
mysql*:::insert-done
{
self->rows = arg1;
}
mysql*:::update-done
{
self->rows = arg2;
}
mysql*:::multi-delete-done
{
self->rows = arg1;
}
mysql*:::delete-done
{
self->rows = arg1;
}
mysql*:::multi-update-done
{
self->rows = arg2;
}
mysql*:::net-write-start
{
self->bytes = self->bytes + arg0;
}
mysql*:::query-done
/self->query != NULL/
{
printf("%s:%s:%d:%d:%d\n",
self->query,
self->db,
((timestamp - self->querystart)/1000),
self->rows,self->bytes);
}
First, we set a pragma to quieten down the output so that the DTrace script only reports what we explicitly write out:
#pragma D option quiet
In DTrace, the individual execution points are called probes, and probes are triggered each time that point in the code is reached. To specify the probes we want to watch for, you use a special format, provider:module:function:name that identifies the probe by the name of the provider (the application), the module, the function, and the probe, each separated by a colon. We can just specify the provider and probe name, like mysql*:::query-start.
It should also be noted that probes are often provided in pairs at the start and end of an operation, so you can identify the start and end of a query by looking for the query-start and query-done probes.
The DTrace probes in the server are set-up in a sort of nested structure, going deeper into the query process as needed. Although not at the very top of the execution cycle, the start of the main query processing is identified by the query-start probe. Each time a query is submitted to MySQL, this probe will get triggered, so for us, it is the start of the process. The probe has a number of arguments, but for our purposes we only need the first (arg0), which contains the full query string, and the third (arg2) which contains the name of the database that the query was executed against.
We also initialize the row and byte counts, and the time when the query was executed using the built-in timestamp value. All of this information is placed into the special self structure, which is a persistent structure used to share information between the individual probes that get fired during execution.
mysql*:::query-start
{
self->query = copyinstr(arg0);
self->db = copyinstr(arg2);
self->rows = 0;
self->querystart = timestamp;
self->bytes = 0;
}
To get the counts of the number of rows, we can’t get the information from the query-done probe. This is because different operations actually provide different levels of information. For example, the select-done and insert-done just provide a count of the rows. But the update-done probe provides information both about the number of rows that matched the original WHERE clause, and the count of the number rows actually modified.
To record the number of the rows modified by the query, we therefore need to pull out each piece of information individually:
mysql*:::select-done
{
self->rows = arg1;
}
mysql*:::insert-done
{
self->rows = arg1;
}
mysql*:::update-done
{
self->rows = arg2;
}
mysql*:::multi-delete-done
{
self->rows = arg1;
}
mysql*:::delete-done
{
self->rows = arg1;
}
mysql*:::multi-update-done
{
self->rows = arg2;
}
For the bytes retrieved by each query, the information is a bit more difficult to identify. I’m going to cheat a bit and use the bytes sent by mysqld during a net write to the client. There is a limitation here I’ve skipped, which is that we could report data sent to any client, since I haven’t bothered to track connection IDs. I could do this, but it would make the script a little more complicated. Since the net-write-start might be called multiple times for a long query, we calculate a cumulative byte count.
mysql*:::net-write-start
{
self->bytes = self->bytes + arg0;
}
That’s all of the information collection; now we just need to print out the information when the query completes. We do this by writing out a colon separated list of the information that we’ve collected. One additional point here though is that to calculate the duration of the query, you take the timestamp recorded when query-start was called away from the current timestamp.
Timestamp information is recorded in nanoseconds (yes, you read that right, nanoseconds), so we divide it by a thousand to get it in microseconds, which is what the Enterprise Manager will expected.
mysql*:::query-done /self->query != NULL/ { printf("%s:%s:%d:%d:%d\n", self->query, self->db, ((timestamp - self->querystart)/1000), self->rows,self->bytes); }
If you run this script on it’s own (against a MySQL running on Solaris/OpenSolaris, with probes, of course), then you’ll get output like this:
SELECT DATABASE()::391:1:44 show databases:test:947:2:84 show tables:test:2018:3:74 select * from t limit 5:test:595:5:51
To provide the information up to the Enterprise Manager we cannot use D scripts. Instead, a wrapper around the D script will read the raw information produced and then pass that up to the Enterprise Manager.
Before we look at that process, it is worth looking at the REST API that has been built in to v2 of the Enterprise Monitor. The interface is available through the standard URL for the Enterprise service, typically your hostname and the port 18080 if you’ve used the default settings. Therefore we can access the interface using the url http://nautilus:18080/v2/rest/, assuming our host is nautilus.
From the base URL, you can start to get information, or put information, about the different entries in the repository using the path in the URL to signifiy what it is we are looking for. Information about instances is within the instance, with the provider as mysql, and the MySQL server as server. Or better put, the base URL would be http://nautilus:18080/v2/rest/instance/mysql/server/.
The last fragment of information we need is the UUID. All objects within the repository have a unique ID, and these are split at different levels. For example, an agent has a UUID, and so does the server it is monitoring. In our example, we want the UUID of the MySQL server, which is conveniently stored within the server itself in the mysql.inventory table.
Finally, we need the username and password of the agent user. Through the REST API we use basic HTTP authentication, to make the process easy.
Putting all of this together, we can get the core information about an instance using wget:
$ wget -qO mysql.server --http-user=agent --http-password=password \
'http://nautilus:18080/v2/rest/instance/mysql/server/2b86b277-fb2b-492d-b946-3a2acaec0869'
If we now look at the output file, mysql.server:
{
"name": "2b86b277-fb2b-492d-b946-3a2acaec0869",
"parent": "/instance/os/Host/ssh:{88:e1:fc:6d:99:69:e4:5f:b4:0a:ec:5a:09:c0:6a:24}",
"values": {
"blackout": "false",
"displayname": null,
"registration-complete": "true",
"repl.groupName": null,
"server.connected": 1,
"server.last_error": null,
"server.reachable": 1,
"transport": "a3113263-4993-4890-8235-cadef9617c4b",
"visible.displayname": "bear:3306"
}
}
I wont go into detail about what is here, most of it should be self explanatory. However, there are a few things of note. First, the information is in JSON format. This makes it easy to read and more importantly create.
Second, note the notation. The item is identified by its name, and also by it’s parent. This is an important construct because it helps identify the different elements with each other. In this case, the MySQL server is associated with a physical host (/instance/os/Host) and the individual host is identified by a SSH key, which is one of the alternative UUID formats support by the Enterprise Server to identify individual entities.
When submitting information, we need to flip the process around. We don’t use a GET request to obtain the information, we use a PUT to send up a JSON packet containing the information we want. The URL for sending the information depends on what we are uploading. The main element for the statements used for Query Analyzer is the statementsummary.
The URL for this is http://nautilus:18080/v2/rest/instance/mysql/statementsummary/. For the identifier at the end of the URL, you use a period-separated list that includes the UUID of the MySQL server, the name of the MySQL database the SQL statement relates to, and an MD5 hash of the SQL statement text.
For the actual packet, we use the following format, taken here from the Perl script:
{
"name": "$server_uuid.$quanbase->{dbname}.$md5",
"parent": "/instance/mysql/server/$server_uuid",
"values" : {
"count": "$quanbase->{count}",
"text": "$quanbase->{query}",
"query_type": "$quanbase->{qtype}",
"text_hash": "$md5",
"max_exec_time": "$quanbase->{max_exec_time}",
"min_exec_time": "$quanbase->{min_exec_time}",
"exec_time": "$quanbase->{exec_time}",
"rows": "$quanbase->{rows}",
"max_rows": "$quanbase->{max_rows}",
"min_rows": "$quanbase->{min_rows}",
"database": "$quanbase->{dbname}",
"bytes": "$quanbase->{bytes}",
"max_bytes": "$quanbase->{max_bytes}",
"min_bytes": "$quanbase->{min_bytes}",
}
}
Most of this should be self-explanatory. Remember that this is a statement summary, which means that we can send up information about multiple invocations of the same statement in one packet. Thus, within the statementsummary packet we have information about the count of invocations of the statement, execution, row and byte counts and maximum/minimum of each of them, and then the core information like the actual query text, database name, and query type (SELECT, INSERT, etc).
Once again, note the name and parent. Here the name is the same tuple as used in the URL, the UUID of the MySQL server, the database, and the hash of the query. This is used as the identifier for this query within the repository and allows us to uniquely identify the query, and the query execution on this server. The parent is the location of, and UUID of, the MySQL server.
Now, the Perl script that collates the information from our D script has to do two things, first read the raw output that we create with the D script, and second, supply this up as a PUT request to the Enterprise Server.
Dealing with the latter part first, I’ve used Perl and LWP (libwww-perl) module to construct a suitable request object with the HTTP authorization attached:
my $header = HTTP::Headers->new; $header->content_type('text/text'); $header->authorization_basic('agent','password'); my $res = LWP::UserAgent->new();Once we’ve constructed a packet, sending it is a case of specifying the URL, the header, and the content:
$header->content_length(length $bio); my $req = HTTP::Request->new(PUT => $url, $header, $bio); $res->request($req);The bulk of the rest of the script is devoted to reading the information from the D script output, and assembling the packet and min/max values per query.
Within the Query Analyzer, the SQL statements are normalized, or canonicalized so that variables are replaced with a question mark. This ensures that we are tracking the query and not the individual values. The significance here is that we want to compare the raw SQL statement, of which there may only be a few hundred in a typical application, not each individual query with it’s
WHEREand other clauses.Hence, the statement:
SELECT photoid,title from media_photos where photoid > 23785 limit 15Would be normalized to:
SELECT photoid,title from media_photos where photoid > ? limit ?For the Perl script, I do just one type of normalization, removing the value from a
LIMIT clause.#!/usr/bin/perl use Data::Dumper; use LWP; use HTTP::Request; use Digest::MD5 qw/md5_hex/; my $server_uuid = '2b86b277-fb2b-492d-b946-3a2acaec0869'; my $header = HTTP::Headers->new; $header->content_type('text/text'); $header->authorization_basic('agent','password'); my $res = LWP::UserAgent->new(); my $interval = shift || 20; print "Sending queries every $interval statement(s)\n"; open(DTRACE,"./merlin.d|") or die "Couldn't open DTRACE\n"; my $counter = 1; my $querybase = {}; while() { chomp; my ($origquery,$dbname,$time,$rows,$bytes) = split m{:}; my $query = $origquery; $query =~ s/limit \d+/limit ?/g; $querybase->{$query}->{dbname} = $dbname; $querybase->{$query}->{query} = $query; $querybase->{$query}->{count}++; $querybase->{$query}->{rows} += $rows; $querybase->{$query}->{bytes} += $bytes; $querybase->{$query}->{exec_time} += $time; if (exists($querybase->{$query})) { $querybase->{$query}->{max_rows} = $rows if ($rows > $querybase->{$query}->{max_rows}); $querybase->{$query}->{min_rows} = $rows if ($rows < $querybase->{$query}->{min_rows}); $querybase->{$query}->{max_bytes} = $bytes if ($bytes > $querybase->{$query}->{max_bytes}); $querybase->{$query}->{min_bytes} = $bytes if ($bytes < $querybase->{$query}->{min_bytes}); $querybase->{$query}->{max_exec_time} = $time if ($time > $querybase->{$query}->{max_exec_time}); $querybase->{$query}->{min_exec_time} = $time if ($time < $querybase->{$query}->{min_exec_time}); } else { $querybase->{$query}->{max_rows} = $rows; $querybase->{$query}->{min_rows} = $rows; $querybase->{$query}->{max_bytes} = $bytes; $querybase->{$query}->{min_bytes} = $bytes; $querybase->{$query}->{max_exec_time} = $time; $querybase->{$query}->{min_exec_time} = $time; } if (($counter % $interval) == 0) { print STDERR "Writing quan packets ($counter queries sent)\n"; foreach my $query (keys %{$querybase}) { send_quandata($querybase->{$query}); delete($querybase->{$query}); } } $counter++; } sub send_quandata { my ($quanbase) = @_; my $urlbase = 'http://nautilus:18080/v2/rest/instance/mysql/statementsummary/%s.%s.%s'; my $md5 = md5_hex($quanbase->{query}); my $url = sprintf($urlbase,$server_uuid,$quanbase->{dbname},$md5); my $bio = < {dbname}.$md5", "parent": "/instance/mysql/server/$server_uuid", "values" : { "count": "$quanbase->{count}", "text": "$quanbase->{query}", "query_type": "$quanbase->{qtype}", "text_hash": "$md5", "max_exec_time": "$quanbase->{max_exec_time}", "min_exec_time": "$quanbase->{min_exec_time}", "exec_time": "$quanbase->{exec_time}", "rows": "$quanbase->{rows}", "max_rows": "$quanbase->{max_rows}", "min_rows": "$quanbase->{min_rows}", "database": "$quanbase->{dbname}", "bytes": "$quanbase->{bytes}", "max_bytes": "$quanbase->{max_bytes}", "min_bytes": "$quanbase->{min_bytes}", } } EOF $header->content_length(length $bio); my $req = HTTP::Request->new(PUT => $url, $header, $bio); $res->request($req); } The basic structure is:
- Open the DTrace script
- Read a line
- Add that to the temporary list of queries I know about, adding stats
- When I’ve read N queries, send up the stats about each query as a JSON packet to the Enterprise Manager
- Repeat
Depending on how busy your server is, you may want to adjust the interval when the stats data is uploaded. The default is every 20 queries, but when running on a really busy server, or when running benchmarks, you might want to up that to prevent the script spending too much time sending fairly small packets of stats up.
If you run the script, it should just work in the background:
$ ./dtrace_merlin.pl Sending queries every 20 statement(s) Writing quan packets (20 queries sent) Writing quan packets (40 queries sent) Writing quan packets (60 queries sent)That’s it!
I set this up and then sent some random queries to the server. The following graphic shows the query data only from the DTrace sourced information.
There are some limitations to the current script. I don’t do full normalization, for example, and I dont send the detailed information about individual statements up at the moment. There is also an
EXPLAINpacket that you can send that contains the output from anEXPLAINon a long running query. I could do that by opening a connection to the server and picking out the information.But what I’d really like to do is use the DTrace-based output to show the detail of each part of the query process and the
EXPLAINoutput. I’m sure I can work on that with the Enterprise team.
Compiling MySQL Workbench on Gentoo
The Workbench team have just announced the release of Workbench for Linux, including binary packages and source packages with instructions on how to build.
I’m a Gentoo Linux user, so I prefer building from source, and you’ll need to emerge the following packages (and note the USE) requirement as part of the source build process:
# USE="svg" emerge libzip libxml2 libsigc++ libglade libgtksourceviewmm media-libs/glut mysql lua ossp-uuid libpcre libgnome gtk+ pango cairo
Depending on your config and platform, you may need to bypass some package masking by adding the packages to your /etc/portage/package.keywords file.
Then download and install the ctemplate library from google code page. The current Gentoo version is 0.90, and you really should install the 0.91 version.
With the required packages and libraries in place, download the Workbench sources and then build:
# cd mysql-workbench-5.1.4alpha # ./autogen.sh # make # make install
That should build and install MySQL Workbench for you.
Just to confirm, here’s a screenshot of the built Workbench running on Gentoo Linux and displaying to my Mac OS X-based desktop.

How to analyze memory leaks on Windows
We use valgrind to find memory leaks in MySQL on Linux. The tool is a convenient, and often enlightening way of finding out where the real and potential problems are location.
On Windows, you dont have valgrind, but Microsoft do provide a free native debugging tool, called the user-mode dump heap (UMDH) tool. This performs a similar function to valgrind to determine memory leaks.
Vladislav Vaintroub, who works on the Falcon team and is one of our resident Windows experts provides the following how-to for using UMDH:
-
Download and install debugging tools for Windows from here
MS Debugging Tools
Install 64 bit version if you’re on 64 bit Windows and 32 bit version
otherwise. -
Change the
PATHenvironment variable to include bin directory of Debugging tools.
On my system, I added
C:\Program Files\Debugging Tools for Windows 64-bitto thePATH. -
Instruct OS to collect allocation stack for mysqld with
gflags -i.
mysqld.exe +ust
On Vista and later, this should be done in “elevated” command prompt,
it requires admin privileges.Now collect the leak information. The mode of operation is that: take the
heap snapshot once, and after some load take it once again. Compare
snapshots and output leak info. -
Preparation : setup debug symbol path.
In the command prompt window, doset _NT_SYMBOL_PATH= srv*C:\websymbols*http://msdl.microsoft.com/download/symbols;G:\bzr\mysql-6.0\sql\DebugAdjust second path component for your needs, it should include directory
where mysqld.exe is. - Start mysqld and run it for some minutes
-
Take first heap snapshot
umdh -p:6768 -f:dump1Where -p:
actually, PID of my mysqld was 6768. - Let mysqld run for another some minutes
-
Take second heap snapshot
umdh -p:6768 -f:dump2 -
Compare snapshots
umdh -v dump1 dump2 > dump.compare.txt - Examine the result output file. It is human readable, but all numbers are
in hex, to scare everyone except geeks. -
gflags -i mysqld.exe -ustInstruct OS not to collect mysqld user mode stacks for allocations
anymore.
These are 10 steps and it sounds like much work, but in reality it takes 15
minutes first time you do it and 5 minutes next time.
Additional information is given in Microsoft KB article about UMDH
KB 268343.
New VoiceXML/XQuery Demo
I’ve got a new VoiceXML/XQuery article coming out, and IBM have asked that a demo of the service is live.
The service is an interface RSS reader - you get to choose the topic and the feed (currently only four static feeds are provided), then it will read out the feed content.
You can try out the demo by calling:
- Skype: +99000936 9991260725
- US (freephone): (800) 289-5570, then using PIN 9991260725
Occasionally the hosting times out, in which case, please contact me and I’ll check it out and restart or reboot the service.
An introduction to Eclipse for Visual Studio users
I’m seeing more and more people moving to Eclipse as a development platform, even those Windows users who have traditionally used Visual Studio. As an Eclipse user for quite a while now I’m often asked how good it is, or how to use it.
Of course, telling people to simply try it out isn’t enough. Many people just don’t get Eclipse and cannot understand or translate the skills and experience they already have to the Eclipse environment. That’s where An introduction to Eclipse for Visual Studio users can help.
It’s a quick overview of the fundamentals of Eclipse from the perspective of a Visual Studio user. For a more in depth examination, there’s a tutorial Eclipse for Visual Studio developers, and another on migrating your applications from VS to Eclipse: Migrate Visual Studio C and C++ projects to Eclipse CDT.
I can recommend any (or indeed all) of these.
Mysterious crashes? - check your temporary directory settings
Just recently I seem to have noticed an increased number of mysterious crashes and terminations of applications. This is generally on brand new systems that I’m setting up, or on existing systems where I’m setting up a new or duplicate account.
Initially everything is fine, but then all of a sudden as I start syncing over my files, shell profile and so on applications will stop working. I’ve experienced it in MySQL, and more recently when starting up Gnome on Solaris 10 9/07.
Sometimes the problem is obvious, other times it takes me a while to realize what is happening and causing the problem. But in all cases it’s the same problem - my TMPDIR environment variable points to a directory that doesn't exist. That's because for historical reasons (mostly related to HP-UX, bad permissions and global tmp directories) I've always set TMPDIR to a directory within my home directory. It's just a one of those things I've had in my bash profile for as long as I can remember. Probably 12 years or more at least.
This can be counterproductive on some systems - on Solaris for example the main /tmp directory is actually mounted on the swap space, which means that RAM will be used if it’s available, which can make a big difference during compilation.
But any setting is counterproductive if you point to a directory that doesn’t exist and then have an application that tries to create a temporary file, fails, and then never prints out a useful trace of why it had a problem (yes, I mean you Gnome!).
I’ve just reset my TMPDIR in .bash_vars to read:
case $OSTYPE in
(solaris*) export set TMPDIR=/tmp/mc;mkdir -m 0700 -p $TMPDIR
;;
(*) export set TMPDIR=~/tmp;mkdir -m 0700 -p $TMPDIR
;;
esac
Now I explicitly create a directory in a suitable location during startup, so I shouldn’t experience those crashes anymore.
Brian is having the same issues
I mentioned the problem with setting up the stack on a new Solaris box yesterday and then realized this morning that I’d already added Brian Aker’s blog posting on the same issues to my queue (Solaris, HOW-TO, It works… Really…).
Brian mentions pkg-get, the download solution from Blastwave which I neglected to mention yesterday. It certainly makes the downloading and installation easier, but its’s far from comprehensive and some of the stuff is out of date.
To be honest I find that I install the stuff from Sun Freeware to get me going, then spend time recompiling everything myself by hand, for the plain and simple reason that I then know it is up to date and/or working or both. This is particularly the case for Perl, which often needs an update of the entire perl binary to get the updated versions of some CPAN modules.
Ultimately, though, it sucks.
Setting up the developer stack issues
There’s a great post on Coding Horror about Configuring the Stack.
Basically the gripe is with the complexity of installing the typical developer stack, in this case on Windows, using Visual Studio. My VS setup isn’t vastly different to the one Jeff mentions, and I have similar issues with the other stacks I use.
I’ve just set up the Ultra3 mobile workstation again for building MySQL and other stuff on, and it took about 30 packages (from Sun Freeware) just to get the basics like gcc, binutils, gdb, flex, bison and the rest set up. It took the best part of a day to get everything downloaded, installed, and configured. I haven’t even started on modules for Perl yet.
The Eclipse stack is no better. On Windows you’ll need the JDK of your choice, plus Eclipse. Then you’ll have to update Eclipse. Then add in the plugins and modules you want. Even though some of that is automated (and, annoyingly some of it is not although it could be), it generally takes me a few hours to get stuff installed.
Admittedly on my Linux boxes it’s easier - I use Gentoo and copy around a suitable make.conf with everything I need in it, so I need only run emerge, but that can still take a day or so to get everything compiled.
Although I’m sure we can all think of easier ways to create the base systems - I use Parallels for example and copy VM folders to create new environments for development - even the updating can take a considerable amount of time.
I suggest the new killer app is one that makes the whole process easier.
Setting a remote key through ssh
One of the steps I find myself doing a lot is distributing round an ssh key so that I can login and use different machines automatically. To help in that process I created a small function in my bash profile script (acutally for me it’s in .bash_aliases):
function setremotekey
{
OLDDIR=`pwd`
if [ -z "$1" ]
then
echo Need user@host info
fi
cd $HOME
if [ -e "./.ssh/id_rsa.pub" ]
then
cat ./.ssh/id_rsa.pub |ssh $1 ‘mkdir -p -m 0700 .ssh && cat >> .ssh/authorized_keys’
else
ssh-keygen -t rsa
cat ./.ssh/id_rsa.pub |ssh $1 ‘mkdir -p -m 0700 .ssh && cat >> .ssh/authorized_keys’
fi
cd $OLDDIR
}
To use, whenever I want to copy my public key to a remote machine I just have to specify the login and machine:
$ setremotekey mc@narcissus
Then type in my password once, and the the function does the rest.
How? Well it checks to make sure I’ve entered a user/host (or actually just a string of some kind). Then, if I haven’t created a public key before (which I might not have on a new machine), I run the ssh-keygen to create it. Once the key is in place, I output the key text and then use ssh to pipe append that to the remote authorized_keys file, creating the directory along the way if it doesn’t exist.
Short and sweet, but saves me a lot of time.
Extra bash improvements
If you’ve read my Getting the most out of bash article at IBM developerWorks then you be interested in some further bash goodness and improvements.
Juliet Kemp covers some additional tricks on Improving bash to make working with bash easier. Some of the stuff there I have already covered, but the completion extensions might be useful if you like to optimize your typing.
Even better, one of the comments provides the hooks to change your prompt to include your current CVS branch, another to include your current platform, and a really cool way of simplifying your history searching.
Controlling OS X volume through Cron
One of the biggest annoyances of working from home is that with the computers in the room next door, the volume of your computers can cause a problem if someone suddenly calls you on Skype, or your backup software suddenly kicks in and starts beeping.
I never remember to mute the volume, so I started looking for a way to this automatically through cron at specific times. I also wanted to be sure that rather than setting a specific volume (and having to remember it), that I could just use the OS X mute function.
The solution is to combine Applescript, which you can run from the command line using the osascript command, with the command line limitations of cron.
There are three components, the two Applescripts that mute and unmute the volume, and the lines in a crontab to run the scripts.
To mute the volume with Applescript:
set volume with output muted
To unmute:
set volume without output muted
Save both these into Applescripts (use the Applescript editor so they are compiled).
Then we can just set the scripts to execute when required:
0 9 * * * osascript /usr/local/mcslp/volume-unmute.scpt 0 19 * * * osascript /usr/local/mcslp/volume-mute.scpt
I’ve set this on the three machines and now we get a silent night!
Making a single extractor
One of my new articles is on smplifying your command line (read more about System Administrators Toolkit: Standardizing your UNIX command-line tools, making your life easier as you move between different environments. The same principles can be applied just to make your life easier. Here’s a function I’ve had in my bash init script for years that gets round the issue of extracting a compressed archive file of various types, even if your tar isn’t aware of the compression type:
function uz ()
{
file=$1
case $file in
(*gz) gunzip -c $file|tar xf -;;
(*bz2) bunzip2 -c $file|tar xf -;;
(*Z) tar zxf $file;;
(*zip) unzip $file;;
esac
}
Now I can extract any file with:
$ uz file{gz|bz2|zip|Z)
And not worry that my Solaris tar isn’t bzip2 aware even though it is Gzip aware.
Copying multiple files with scp
I keep my .bash init scripts on one machine and copy them over to each machine on which I have a login. There’s various bits of logic in there to ensure that the right PATH and other values are set according to the host and/or platform.
I then have a simple line that updates the .ocal .bash scripts from the main box that holds the main copies, so that I can just run:
update-bash
To update everything. I use scp and, depending on the system, use a preset key or require a password.
For copying multiple files there are many solutions; I could just use .bash*, but I’d also get the history and backup files. The typical advice is separate entries:
scp mc@narcissus:.bashrc mc@narcissus:.bash_aliases
This is less than optimal for a number of reasons - the first is that each location is treated individually, and that requires multiple connections and multiple password requirements. You can, though, use normal shell like expansion, just make sure you use quotes to ensure that it isn’t parsed and expanded by the local shell instead of the remote one:
scp mc@narcissus:".bash{rc,_path,_aliases,_vars}" ~
Stepped execution with cron and at
I had a query from a reader today as a follow up to my System Administrators Toolkit: Time and event management article at developerWorks:
How do I execute a script at a specific interval, for example 28 days, rather than on a specific day or date?
It is the one limitation of cron that it doesn’t support such an interval, although there are some systems (including many Linux installations) that provide an alternative method. There are some solutions to the problem that will work on any platform that uses the cron/at system.
One way is to run the script every 7 days, and have it record how many times it’s been called in a file.
All you have to do is, in the script, load the current count, work out if this is the fourth time, and run the script accordingly.
For example:
count=`cat counter`
count=`expr $count + 1`
if [ $count -eq 4 ]
then
echo 0 >counter
echo 4th time called, going for it
# Do everything else
else
echo $count >counter
fi
I suggest you put the counter file into a usable location, but you get the idea.
The other alternative is to use at, rather than cron, and then add a line in the script to execute the script again in 28 days time. For example, using this line at the end of your script:
at 9pm + 28 days <myscript .sh
Because you are specifying the same time, but a different day, this will execute at the same time every 28 days.
If your script takes a long time to process and you run it, for example, at 23:59, put the ‘at’ line at the start of the script, rather than the end, so that the request gets registered on the same day.
Building an RPN to Equation Parser
In the final part of the examination of lex and yacc, here are the rules for building a parser that translates RPN into equation input (the reverse of the Equation to RPN parser.
Translating RPN into standard equation format is a lot more difficult. Although the fundamentals are similar to the RPN parser (we still use a stack for values that are popped off when we see an operand), it is the recording of that process is much more difficult.
In the RPN calculator, we can place the result of the calculation back onto the stack so that the value can be used. To resolve something into the equation format we need to record the equivalent expression, not the value. For that, we use a temporary string, and then check if the temporary string has a value and append further expressions to that string.
Also, to help precedence in the final calculation (a process handled automatically by the sequence of numbers an operands in RPN) we also enclose each stage of the calculation in parentheses.
The resulting rules are shown below. Note that for the example, only the basic operands (+ - * /) are supported, but the principles are valid for any combination.
%%
list: /* nothing */
| list EOLN
| list expr EOLN { printf( "%s\n",exprstring); }
;
expr: primary
| expr primary MUL
{
if (strlen(exprstring) > 0)
{
sprintf(tmpstring,"(%s * %g)",exprstring, pop());
}
else
{
sprintf(tmpstring,"( %g * %g )",pop(),pop());
}
strcpy(exprstring,tmpstring);
}
| expr primary DIV
{
temp=pop();
if (strlen(exprstring) > 0)
{
sprintf(tmpstring,"(%s / %g)",exprstring, temp);
}
else
{
sprintf(tmpstring,"( %g / %g )",pop(),temp);
}
strcpy(exprstring,tmpstring);
}
| expr primary PLUS
{
if (strlen(exprstring) > 0)
{
sprintf(tmpstring,"(%s + %g)",exprstring, pop());
}
else
{
sprintf(tmpstring,"( %g + %g )",pop(),pop());
}
strcpy(exprstring,tmpstring);
}
| expr primary MINUS
{
temp=pop();
if (strlen(exprstring) > 0)
{
sprintf(tmpstring,"(%s - %g)",exprstring, temp);
}
else
{
sprintf(tmpstring,"( %g - %g )",pop(),temp);
}
strcpy(exprstring,tmpstring);
}
;
primary: NUMBER { push($1); }
;
%%
You can see the resulting output below:
4 5 + 6 * (( 4 + 5 ) * 6)
As mentioned in the original IBM article, we can pipe sequences together to show the parsing and calculation of an expression from different formats. For example:
$ rpntoequ|calc 4 5 + 6 * 54
And even rpntoequ and equtorpn:
$ rpntoequ|equtorpn 4 5 + 6 * 4 5 + 6 *
The current RPN translator as shown here is not as advanced as the main RPN system, and so it doesn’t support all the options, or expression formats, but you can get the general idea.
You can download the code for this example: rpntoequ.tar.gz (Unix).
Building an Equation to RPN Parser
As part of the continuing examination of lex and yacc, here are the rules for building a parser that translates equations into RPN format.
The process is actually very simple. Because of the way the parser works, all you have to do is print out whatever component we see at each stage. For example, when you see a number, print it out, and when you see a operand, also print it out. The basic ruleset is shown below:
%%
list: /* nothing */
| list EOLN
| list expr EOLN { printf( "\n" ); }
;
expr: shift_expr
;
shift_expr: pow_expr
| shift_expr LEFTSHIFT pow_expr { printf("< < "); }
| shift_expr RIGHTSHIFT pow_expr { printf(">> "); }
;
pow_expr: add_expr
| pow_expr POW add_expr { printf("^ "); }
;
add_expr: mul_expr
| add_expr PLUS mul_expr { printf("+ "); }
| add_expr MINUS mul_expr { printf("- "); }
;
mul_expr: unary_expr
| mul_expr MUL unary_expr { printf("* "); }
| mul_expr DIV unary_expr { printf("/ "); }
| mul_expr MOD unary_expr { printf("% "); }
;
unary_expr: postfix_expr
| MINUS primary %prec UNARYMINUS { printf("-"); }
| INC unary_expr { printf("++ "); }
| DEC unary_expr { printf("-- "); }
;
postfix_expr: primary
| postfix_expr INC { printf("++ "); }
| postfix_expr DEC { printf("-- "); }
| postfix_expr FACT { printf("! "); }
;
primary: NUMBER { printf("%g ",$1); }
| PI { printf("%g ", M_PI); }
| OPENBRACKET expr CLOSEBRACKET { }
| function_call
;
function_call: SIN OPENBRACKET expr CLOSEBRACKET { printf("sin "); }
| COS OPENBRACKET expr CLOSEBRACKET { printf("cos "); }
| TAN OPENBRACKET expr CLOSEBRACKET { printf("tan "); }
| ASIN OPENBRACKET expr CLOSEBRACKET { printf("asin "); }
| ACOS OPENBRACKET expr CLOSEBRACKET { printf("acos "); }
| ATAN OPENBRACKET expr CLOSEBRACKET { printf("atan "); }
;
%%
Why does it work?
It has to do with the parser evaluates the different components. When, for example, the parser identifies an addition with this rule:
add_expr: mul_expr
| add_expr PLUS mul_expr { printf("+ "); }
The code that the parser generates evaluates the sub-rules first, and in both cases the rules will ultimately lead to the numerical value. Each time the number is seen, the value is printed. Once both rules have been resolved, it then matches the full expression and outputs the plus sign.
In use, the parser generates all of the necessary RPN:
4+5*6 4 5 6 * + (4+5)*6 4 5 + 6 *
You can download the source for the equation to RPN parser: equtorpn.tar.gz (Unix)
Building an RPN Calculator
There’s an tutorial shortly due to appear at IBM developerWorks that covers the process behind building a calculator using the lex and yacc (or flex and bison) tools to build a parser. The tutorial covers a natural expression parser, i.e. one capable of processing:
(4+5)*6
I suggest a couple of extensions in the tutorial, namely a Reverse Polish Notation (RPN) calculator, and translators that convert to/from RPN and standard equation format. Here, we’re going to start with looking at the RPN calculator.
The RPN system is more straightforward for people to learn when you think about typical equations, for example you might write:
45 63 +
In RPN, you would enter this as:
45 63 +
From a parsing point of view, the process is also easier, because you can perform the calculation by pushing the numbers on to the stack and then performing a calculation with those two numbers. This hugely simplifies the parser, but it only has to push numbers and pop them off when it sees the operand, rather than having to extract both numbers and parser from the input text. Even better, compound calculations can be made easier because the result of one calculation can be pushed back on to the stack for the next part.
For example, the following equation:
45 63 + 23 *
Can be read as:
- Push 45 on to stack
- Push 63 on to stack
- Pop value off stack, add it to another value popped off the stack
- Push result to stack
- Pop value off stack, multiply by value popped off stack
The lexical analysis component (i.e. the lex definitions) remain the same, it’s only the parser that changes. Before we examine the yacc rules, you need to see the simple stack system. It provides two functions, one pushes values on, and the other pops values off. All the values are stored in a simple array and a global stack pointer holds the current storage location so that values can be popped off or pushed back:
#include "globdefs.h"
int sp=0;
double val[MAXVAL];
void push(f)
double f;
{
if (sp < MAXVAL)
val[sp++]=f;
else
printf("Error: stack full, cant push%g\n",f);
}
double pop()
{
double value;
if (sp > 0)
return(val[--sp]);
else
{
printf("error: stack empty\n");
return 0.0;
}
}
The yacc rules for a simple RPN parser are shown below (the rest of the surrounding code is identical).
%%
list: /* nothing */
| list EOLN
{ printf( "%g\n" , pop()); }
| list exprlist EOLN
{ printf( "%g\n" , pop()); }
;
exprlist: shift_expr
| exprlist shift_expr
;
shift_expr: add_expr
| shift_expr LEFTSHIFT
{
temp=pop();
push(((int)pop()) < < ((int)temp));
}
| shift_expr RIGHTSHIFT
{
temp=pop();
push(((int)pop()) >> ((int)temp));
}
;
add_expr: mul_expr
| add_expr PLUS
{ push(pop()+pop()); }
| add_expr MINUS
{
temp=pop();
push(pop()-temp);
}
;
mul_expr: unary_expr
| add_expr MUL
{ push(pop()*pop()); }
| add_expr DIV
{
temp=pop();
push(pop()/temp);
}
| add_expr MOD
{
temp=pop();
push(fmod(pop(),temp));
}
;
unary_expr: primary
| MINUS primary %prec UNARYMINUS { push(-pop()); }
| unary_expr INC { push(pop()+1); }
| unary_expr DEC { push(pop()-1); }
;
primary: NUMBER { push($1); }
| PI { push(M_PI); }
;
%%
You can see here that numbers are simply pushed onto the stack:
primary: NUMBER { push($1); }
| PI { push(M_PI); }
;
While any calculation is a case of popping off the values and putting them back on the stack:
add_expr: mul_expr
| add_expr PLUS { push(pop()+pop()); }
| add_expr MINUS { temp=pop(); push(pop()-temp); }
;
The ruleset is shorter, partially because this RPN calculator is not as advanced, but also because the process is much simpler because the rules don’t need to take into account the complex structure of a typical equation line.
In a future post we’ll cover the RPN to equation and equation to RPN parsers.
You can download the complete code for the RPN calculator as rpn.tar.gz (Unix).
Using awk with different input/output separators
I had to reformat some stuff from the man pages for inclusion in another document that would be converted to a proper table. Here’s a trick for using awk/gawk to take the input (multiple spaces) and output with tabs using different input and output separators.
BEGIN { OFS = "\t"; FS = "[ ][ ]+" }
{ print $1,$2,$3,$4 }
I only wanted the four columns from the original table, hence why I specified them explicitly here.
