Archive for the ‘technical’ tag
The Acid Test
Fun chemistry fact of the day: Acidity regulators regulate pH in general, not just acidity. Hence (presumably) why this smoothie bottle contains Citric Acid as an acidity regulator (my first thought was: shouldn’t it be an alkali?).
This is when I wish I’d done Chemistry A-Level rather than Further Maths.
Blog Moved
This blog has now moved to my new domain andrewferrier.com. You shouldn’t notice any change if you are using a web browser or a well-designed feedreader to read it, as all parts of the old blog (including permalinks, RSS feed, etc.) should permanently redirect to the new one. You might just want to check that your RSS reader is pointing to the new blog though, or alter your browser bookmarks. The redirection will disappear in a few months. I’d appreciate it if you can let me know if you see any problems with the new blog.
Update 2007-01-13: I should add that some feedreaders will treat all the items in the feed as new, because the GUID will have changed. Just mark them all as read. Apologies for the inconvenience.
Exim: Remove ‘if error_message…’ From Your .forward
For users of Exim only:
It’s normally recommended to include the line if error_message then finish endif in your .forward filter file, to make sure error messages don’t cause recursive problems in your mail system. I have found that this doesn’t work in practice if you normally receive a lot of spam, because spammers are increasingly using your email address to spam other people. This causes bounces back to you, which bypass your spam filter because of that line (at least in my setup, using sa-exim). Removing that line greatly reduced the amount of spam reaching my inbox.
Spam and OCR
It’s strange how the same techniques can be used to attack both sides of a problem. For some time now, some of the more sophisticated web spammers have been using OCR techniques to circumvent CAPTCHAs on websites in order to hijack free email accounts, submit comment spam on blogs, and similar forms of mischievousness.
As the more capable e-mail spammers seem to be figuring out that anti-spam technologies are getting pretty good at filtering out the crap they send, normally using rule-based detection, Bayesian learning, or a combination of the two, a lot of spam now being sent out is image-based - and anti-spammers are now using OCR to fight back against this new tide.
As I’ve mentioned before, I have a huge spam problem on my personal e-mail account (~4,000/week) - due to a combination of bad luck and some foolish naivety at a few points - and so I have a fairly highly-tuned SpamAssassin installation running at home, with plenty of custom rules and plugins. I’ve seen a rising amount of image spam on it, so I decided to give FuzzyOcr, a plugin for SpamAssassin, a try. So far, the results are pretty impressive. FuzzyOcr uses the open-source gocr program as the engine, and ties it to with SpamAssassin and some logic. The OCR is fairly CPU-intensive, so unlike most SpamAssassin plugins, it only kicks in if the message is otherwise going to be below a certain scoring threshold. So far it has roughly halved the volume of spam that slips through into my inbox (previously ~40-50/day), which is a welcome improvement.
However, fun though they are as a technical challenge, technical approaches such as these always feel like fighting a losing battle. I might write a lengthier article on this at a later date, but I’d like to see ISPs take a far more hardline attitude with their peers that host spammers. There are also compelling economic solutions to the problem, mostly related to micro-payments for sending email. There are problems with those too (how do you roll them out gradually?), but you rarely see graphs of spam that have a downward trend - a solution to the spam problem would be most welcome.
Lightbulb Conundrum - Drinks, Anyone?
A pint is yours if you can solve this conundrum for me (a theoretical explanation you can convince me of will do; I have a practical workaround).
A few weeks ago I replaced some of the bulbs in my house with energy-saving ones. However, the ceiling light in my hall behaves in a very odd manner. Occasionally, after I switch it off at the wall, it flickers on very briefly (for about 1/10 second) about once every minute - even though the power is (allegedly) off. The flicker is fairly dim, so I only notice it at night. If I take the bulb out of the socket, the flicker stops. If I put it back, it starts again. This behaviour happily continues for hours - to the extent that I remove the bulb when it happens because it’s too distracting when trying to sleep.
Perhaps it’s some kind of residual charge in the bulb. But this doesn’t really seem to explain why it only flickers when the bulb is in the socket (even though the switch is off). It also doesn’t explain why it doesn’t happen in the rest of the house (they have the same brand of bulb). The only difference is that the hall has two switches - but they aren’t dimmer switches or anything special.
Any thoughts?
Free Hour
Once a year (today is the day) I wake up and realise I’ve been given a free hour. Does anyone else savour that moment?
(Of course, once a year, I lose an hour - but I prefer not to talk about that…)
Flaky Trackback / Pingbacks on Wordpress 2.0.x
It seems that pingbacks and trackbacks (which are pingbacks’ more awkward, older cousin) are a bit flaky on Wordpress 2.0.x (for more information on how both are supposed to work, see this excellent tutorial). I’ve long suspected that’s the case, because blog entries I’ve linked to haven’t had pingbacks appear, and it seems I’m not the only one with such problems. However, I’ve tested pingbacks with this blog in both directions against TestTrack, which enables you to test ping- and trackbacks, and it does seem to work. TestTrack is running an 2.1 alpha level of Wordpress, so let’s hope that sorts out the bugs when it arrives.
Feedreader
I’d been struggling for a while to find a decent RSS reader for Windows. However, I’ve now been using Feedreader for a few weeks, and am very happy with it. It fully supports nested folders/categories, which is nigh-on essential if you’re regularly monitoring as many feeds as I am (>100). You can effectively aggregate several feeds together by viewing them at the folder level. Feeds can be viewed using the text contained within the feed itself, or you can easily open the original blog entry inline. The OPML import/export support seems robust, and fully supports the nested folders. Feedreader will also discover feeds in a relatively intelligent way if you feed it a blog URL, as well as supporting searching across all cached blog entries.
All in all, pretty impressive.
OpenSSH Niggle #329
It appears that in some fairly recent version of OpenSSH, the support for the ~/.ssh/authorized_keys2 file was removed (along with the known_hosts2 file). It had apparently been deprecated in preference to the ~/.ssh/authorized_keys file a while ago. This caused me some grief when my ISP silently upgraded OpenSSH recently and my automatic backup scripts (which rely on key authentication) stopped working. Renaming the file fixed the problem.
Imperial MEng Presentations
IBM Hursley invited three final-year MEng students from Imperial College to give us presentations on their individual MEng projects today (mine, from several years ago, can be found here). They were:
- Marc Hull, who talked about his project on Balancing simplicity and efficiency in web applications. Marc’s work focused on improving the development of stateful web applications, and in particular on object-relational mapping in Java, in an attempt to allow more straightforward persistence of objects to databases. This has always seemed to me to be an area lacking in usability and ease (see J2EE for plenty of examples), so anything that moves us closer is welcome.
- Matthew Sackman, who talked about his project Glint: Breeding Mobile Ambients with Actors (which won the IBM Project Prize for the best final year Individual Project). Essentially, Matthew seems to be attacking the area of concurrent and distributed computing, in order to improve its robustness against deadlock (and other concurrency problems). He has chosen to do this by writing a compiler for the GLINT language, which is based on an Actor model and is especially particularly suitable for modelling concurrent systems.
- Francis Russell, who talked about his project Delayed Evaluation and Runtime Code Generation as a means to Producing High Performance Numerical Software. Francis’s infrastructure shifts some code generation and execution to the runtime of a program (lazy evaluation). It does this by building up a DAG to represent expressions that are ’should’ have been already evaluated. The expression it represents isn’t actually evaluated until it’s needed, which enables certain optimisations to be performed (which is useful, for example, in matrix arithmetic). The framework generates and executes the optimised code at runtime (and it also caches this generated code).
I had the chance to meet these folks briefly (Marc and Matthew had also been here previously, when they were part of the team from Imperial who won the Thinkpad Challenge). It was interesting to see some academic work for a change - whilst I’d never be able to make a career out of that, bringing academia and business together always seems to reap benefits.
I wish Marc, Matthew and Francis luck if they choose to develop their projects further.