Andrew Ferrier’s Blog

Economics; Travel; Film; and Technology.

Archive for the ‘Software Engineering’ tag

More Flexible Firefly Smart Playlists with Perl, sqlite3 and m3u

with 3 comments

I use Firefly (previously called mt-daapd) as a media server for my Roku Soundbridge. It has a feature called ‘Smart Playlists’ that dynamically create playlists based on certain criteria, but they aren’t that powerful - they don’t support sorting or other more advanced query features.

Fortunately, underlying Firefly is a sqlite database, which can be queried using standard SQL syntax. This enables a technique of creating static playlists that are automatically re-generated periodically instead.

The prerequisites for the following technique are:

  • Perl, with the File::Spec module (to convert from absolute paths to relative ones, which is what Firefly expects).
  • The sqlite3 command-line interface.

The three commands that follow will create a standard .m3u playlist with the top 100 most-played songs from Firefly’s database, and another playlist with all the non-Podcasts added in the last month, ordered by the time they were added. Neither of these are possible using Firefly’s query language.

sqlite3 /var/cache/mt-daapd/songs3.db 'select path from songs order by play_count desc limit 100' | perl -nle 'require File::Spec; $_ = File::Spec->abs2rel($_, "$PLAYLIST_DIR"); print;' > "$PLAYLIST_DIR/Most-played songs.m3u"
 
MONTHAGO=$(perl -e 'use Date::Calc::Object qw(:all); $date = Date::Calc>now(); $date += [0,-1,0,0,0,0]; print $date->mktime();')
 
sqlite3 var/cache/mt-daapd/songs3.db "select path from songs where genre!='Podcast' and time_added > $MONTHAGO order by time_added desc" | perl -nle 'require File::Spec; $_ = File::Spec->abs2rel($_, "$PLAYLIST_DIR"); print;' > "$PLAYLIST_DIR/Music added in last month by most recent.m3u"

(obviously, if you use these, you’ll need to alter paths to suit, make sure the correct Perl modules are installed, remove line breaks to make it easier to read, etc.)

Firefly will read these .m3us if configured correctly during its next rescan, and use them as it would any other playlists. You can force a rescan with the following wget command:

wget --delete-after -q --http-user noone --http-password yourpasswd "http://localhost:3689/config-update.html?action=rescan"

Although not fully dynamic (they are not generated on request from the Soundbridge), if these commands are called from cron or similar, the playlist can be kept up-to-date ‘enough’.

Written by andrewferrier

January 5th, 2008 at 3:30 pm

Source Code Crazy

with one comment

I’ve thought for a while that build, source-code management, and bug tracking software (which I’m collectively calling meta-software) could, and should, be so much simpler. I’ve written previously about my contention that bugs and features are the same thing, but the problem is wider. Software has a tendency to acquire features over time, and software that’s used to make other software is no exception. Here are some assorted thoughts about how to improve the situation:

  • Always use integrated source-code libraries and bug tracking. This is something that CMVC and other systems do excellently, and up till recently was fairly poorly served by the open-source software community, although projects such as Trac are doing a good job of closing this gap. The ability to see what changes are associated with what bug is invaluable. Anything else is a recipe for mistakes.
  • Get rid of all the excuses. There are only two valid reasons for permanently closing a bug: (i) it’s fixed; (ii) the developer disagrees that the change will improve the software for the user. Anything else isn’t OK. A corollary of my rule about bugs and features being the same thing is that bugs can’t be returned just because they are feature requests. All bugs that aren’t targeted for a release currently being worked on are still open bugs (just with a different target field). All bugs we don’t want to fix because they don’t seem sensible to fix right now (which could be marked as such) are still open bugs. All bugs in an external dependency are still bugs (the whole system doesn’t work). All bugs that can’t be recreated were seen once (assuming you trust your testers) and are still bugs. A good bug tracker is a database, and will let you see any subset of these any time you want, so no-one needs to get blamed unfairly.
  • Make sure each bug only has a few panic fields. ‘Severity’, ‘Importance’, ‘Priority’, ‘Ease of Recreation’, ‘Impact on Customer’, ‘Impact on Developer’, ‘Impact on Tester’ are all ambiguous. Pick a maximum of two, preferably just one, and make sure everyone knows exactly what they mean. After all, only one field really matters - how much does this affect the user of our software? Everything else should be secondary.
  • What Joel says about explaining off-buttons to uncles applies equally to the design of procedures around software development - everyone will think their exception to the process is vital until you’re drowning in exceptions. This relates more to conventions surrounding software development rather than meta-software itself, but it’s still relevant. Keep it simple - regular builds on a schedule everyone knows; keep everything in one place; reduce the number of parties required to make a decision about any change to the bare minimum. Scott Berkun has a lot to say about this in his excellent book The Art of Project Management.

Fundamentally, though, maybe none of the above will help. Meta-software is perhaps destined to suffer from featuritis more than other software precisely because usability is not so important for its userbase (in my experience, most developers don’t like bad interfaces, but can also cope with them). Only time will tell if developers will be set free.

Written by andrewferrier

December 12th, 2006 at 9:38 pm

Sexual Synchronicity Economics

without comments

I’ve written about synchronicity vs. asynchronicity before, but I wanted to revisit the subject because it seems to be so key to modern services; as more and more communication mechanisms evolve out of available technology and entrepreneurs’ imagination, understanding customer’s usage patterns will be important when developing businesses around them. An excellent article by Gregor Hohpe, Starbucks Does Not Use Two-Phase Commit (included in Joel Spolsky’s Best Software Writing Vol. 1), is an examination of why understanding computer science concepts such as 2PC (and, I would argue, synchronicity) is important when engaging in business process engineering. There’s a large overlap between business and software engineering here, and this is why IBM sells products like WebSphere Process Server together with business consultants to help customers implement them. There are a number of other essays in Spolsky’s excellent book which also discuss related subjects.

Clay Shirky, in his essay A Group is Its Own Worst Enemy (also included in the same volume; the online copy is edited slightly differently from the printed one), notes how online (synchronous) discussions frequently descend into talk about sex - and that sexual banter is much more common in synchronous communication than asynchronous (how often have you flirted with someone over the phone compared to email? - please, no anecdotes in the comments section). I’m not a psychologist, but I assume that this has something to do with it being hard to retain the thrill of adult banter over the course of a (potentially lengthy) asynchronous discussion. The same arguments probably apply in a less dramatic fashion to non-sexual communication.

There’s a related observation to be made about the perceived economics of people’s time. In general, most folks implicitly value synchronous time as higher than asynchronous - if I ask advice of a mentor over a half-hour coffee, I feel more indebted to him than if he spends half an hour hour answering my email. I suspect the reasons are a combination of my having accurate information (I know exactly how long he spent drinking the coffee), the start-up and tear-down time (he actually took 5 minutes to get to the coffee shop), and knowing that I have his undivided attention (he wasn’t multi-tasking). Nevertheless, we still continue to rate synchronous time more highly than its opportunity costs compared to asynchronous time.

To relate the two assertions, wouldn’t you rather spend half an hour in person with your spouse / significant other / other politically correct phrase than an hour writing and exchanging emails with them? Synchronous communication has a strange attraction than its poor cousin doesn’t - despite all of asynchronicity’s time-shifting advantages. This is going to be a big challenge for a multi-time-zone world.

Written by andrewferrier

November 24th, 2006 at 10:30 am

Spam and OCR

without comments

It’s strange how the same techniques can be used to attack both sides of a problem. For some time now, some of the more sophisticated web spammers have been using OCR techniques to circumvent CAPTCHAs on websites in order to hijack free email accounts, submit comment spam on blogs, and similar forms of mischievousness.

As the more capable e-mail spammers seem to be figuring out that anti-spam technologies are getting pretty good at filtering out the crap they send, normally using rule-based detection, Bayesian learning, or a combination of the two, a lot of spam now being sent out is image-based - and anti-spammers are now using OCR to fight back against this new tide.

As I’ve mentioned before, I have a huge spam problem on my personal e-mail account (~4,000/week) - due to a combination of bad luck and some foolish naivety at a few points - and so I have a fairly highly-tuned SpamAssassin installation running at home, with plenty of custom rules and plugins. I’ve seen a rising amount of image spam on it, so I decided to give FuzzyOcr, a plugin for SpamAssassin, a try. So far, the results are pretty impressive. FuzzyOcr uses the open-source gocr program as the engine, and ties it to with SpamAssassin and some logic. The OCR is fairly CPU-intensive, so unlike most SpamAssassin plugins, it only kicks in if the message is otherwise going to be below a certain scoring threshold. So far it has roughly halved the volume of spam that slips through into my inbox (previously ~40-50/day), which is a welcome improvement.

However, fun though they are as a technical challenge, technical approaches such as these always feel like fighting a losing battle. I might write a lengthier article on this at a later date, but I’d like to see ISPs take a far more hardline attitude with their peers that host spammers. There are also compelling economic solutions to the problem, mostly related to micro-payments for sending email. There are problems with those too (how do you roll them out gradually?), but you rarely see graphs of spam that have a downward trend - a solution to the spam problem would be most welcome.

Written by andrewferrier

November 10th, 2006 at 2:12 pm

Two Google Ideas

with 5 comments

Google have created a powerful brand based on creating simplicity from complexity (what all good IT is about). Their tools aren’t perfect, but they’ve made life easier for billions, and so I think they still deserve some free feedback from time-to-time. So, a few thoughts:

  • Mr. Google, please develop a podcast search engine. So much interesting content is now being released as podcasts (quick plug for my favourite: EconTalk), that it would be useful to be able to search them. All you have to do is invent a speech-to-text interpreter that actually works reliably. Simple. [Note: as I sometimes do, I wrote this post in advance of it being published. I've since discovered that such a tool already exists. However, I thought I'd leave the original prose here: Google, if you get one out soon, you could still corner the market]
  • Mr. Google, please stop developing so many interfaces - and plug them all together. If I want to do an exhaustive search for something, I now have to search Google Web, Google Images, Google Groups, Google News, Google Video, Google Blog Search, Google Book Search, Google Scholar, and possibly others. This is not a good thing - you’re straying from the simple search you started with. Some of those searches do show up in the main search results, but you could do a better job of tying them together to show what I’m actually looking for. This could be a real competitive edge, especially since the basic searches that MSN and others provide are now actually quite reasonable.

Google still have an edge in providing what people want - for a company so technically-focused, they either have talented marketers or are just lucky. Please, Google, keep it up.

Written by andrewferrier

November 4th, 2006 at 1:07 pm

Silly Word of the Day #94

with one comment

Marchitecture. I shamelessly stole this from a presentation I attended the other day (names withheld to protect the innocent). If it resonates with you, it probably doesn’t need explaining, but marchitecture is IT architecture that is used for marketing reasons rather than technical ones. Sometimes the marchitecture looks the same as the ‘real’ architecture, sometimes not. Wikipedia’s definition seems a bit narrow (I’m not sure what electronic architecture is anyway), but hey. No original research seems to one of the more widely violated Wikipedian principles.

Written by andrewferrier

October 18th, 2006 at 9:48 am

Imperial MEng Presentations

without comments

IBM Hursley invited three final-year MEng students from Imperial College to give us presentations on their individual MEng projects today (mine, from several years ago, can be found here). They were:

I had the chance to meet these folks briefly (Marc and Matthew had also been here previously, when they were part of the team from Imperial who won the Thinkpad Challenge). It was interesting to see some academic work for a change - whilst I’d never be able to make a career out of that, bringing academia and business together always seems to reap benefits.

I wish Marc, Matthew and Francis luck if they choose to develop their projects further.

Written by andrewferrier

October 9th, 2006 at 5:32 pm

IBM and Open-source

with 2 comments

One of the things I’ve felt IBM’s been strong at in recent years is the way we embrace open-source as a development model, and as a model for providing software to our customers. Eclipse, which has a lot of support from IBM, is a well-known example, but there are plenty of others. I honestly believe that the provision of open-source software gives us a competitive edge over IT organisations of comparable size. Sure, we don’t do it for all our products, but for toolchains like Eclipse it has delivered real benefits - the number of plugins developed for it is testament to that.

Greg at IBM Eye has provided a brief recap of the many software domains in which IBM supports open-source. I think Greg’s analysis is too narrow - there are other benefits to open-source software apart from savings on purchase cost - some of them bogus and some not - but quality and maintainability are certainly side-effects of open-source software, and are worth having even if they’re hard to quantify.

Written by andrewferrier

September 13th, 2006 at 8:35 am

Reuse and SOA

with one comment

Joe McKendrick discusses SOA and reuse in a recent blog entry, essentially drawing on some comments from David Chappell that reuse didn’t do as well as predicted in the era of object-orientation, and that SOA isn’t faring well in this department either. Dave Linthicum, in his latest podcast, also discusses this topic.

I’m not sure I can comment that widely on the state of current SOA projects, and I would agree that SOA may suffer from similar management problems to that of object-orientation: if developers of SOA systems aren’t rewarded for saving time with a reuse strategy, they won’t be enthused to do so. This is an important part of any software project, and encouraging reuse is a best practice that shouldn’t be restricted to object-orientation or SOA.

However, whilst I agree that SOA has other benefits apart from encouraging reuse, I have a fairly high opinion of its potential in that respect. It’s important to understand what we mean by reuse. Reuse rarely means using an object or service as it is. There is often a mismatch between the interface offered by the service (object) being consumed, and the service (object) that needs to call this interface. Expecting anything else is unrealistic (even if future reuse plans are made). This is often solved using something like a façade pattern in object-oriented languages, and some form of mediation with services (such as that offered by WebSphere ESB). The latter is often easier, because there is a lower degree of coupling than inside a single programming language, and because programming code is not often needed, and this is why I believe SOA reuse is simpler - if done well. Of course, some work is still required, but this greater ease of reuse makes it a realistic strategy for more scenarios.

I would agree, however, that, as is often the case, the project management problems here are the greatest ones.

Written by andrewferrier

September 10th, 2006 at 5:35 pm

Google Test Automation Conference

with 2 comments

I spent last Thursday and Friday in London at the Google offices in Victoria for the first Google Test Automation Conference. The presentation topics ranged widely, considering the relatively narrow scope of the conference, but most were well developed and interesting, even if some retrod familiar topics. Some of the highlights included:

  • Steve Loughran and Julio Guijarro, HP Labs. This presentation was about Smartfrog, a system deployment framework, which Steve and Julio were working on as part of a strategy for system testing. They demonstrated several examples of how the system might work in practice. Smartfrog looks pretty flexible, and I plan to spend some time looking into it. Frameworks for deployment have an inherent problem in catering to the wide variety of platforms, configuration mechanisms, deployment combinations and so on that are necessary in practice. Anything that gets closer to this is therefore welcome. Smartfrog also has the interesting property that the XHTML it produces as output is sufficiently well-formed that, although it has an embedded CSS stylesheet for presentation in a web browser, it can also be parsed as XML data without much effort, and thus act as a machine-readable data source as well. This might seem obvious to some folks, and I’m willing to bet it’s not the first time it’s been done, but it seemed novel to me.
  • Robert Chatley (a fellow alumnus from Imperial College) and Tom White, Kizoom. Tom and Robert were talking about what they called literate functional testing. Essentially this involves creating tests, in this case written in Java, that use plain English for method names, variables, and so on. This means that once punctuation is stripped out, Java code - test assertions - become statements that are readily understandable by non-programmers (such as the business analysts in their organisation). Their framework will shortly be available on Google Code.
  • Andreas Leitner, ETH Zurich: Andreas discussed automated testing using contracts. Essentially this means using a language which has pre-conditions and post-conditions on methods, and optionally assertions on objects. He has developed a testing framework called AutoTest, using the Eiffel language, which has these features built in (similar extensions are available for more mainstream languages such as Java). Once these restrictions are placed on a program, generated input can be used to determine whether methods behave as they should. A number of strategies are available for generating this data, and they are pluggable into Andreas’s framework. The simplest is of course to generate the data randomly, but other, more sophisticated strategies are available to improve coverage. Andreas stressed that this type of testing, which is essentially fuzz testing with intelligence, should be used to complement human-created unit tests, not to replace them.
  • Goranka Bjedov, Google: Goranka explained some of the background to performance testing, including the differences between performance testing, stress testing, load testing, scalability testing, etc., and how to deploy and manage performance testing systems.
  • The conference finished with 10 lightning talks. Subjects covered included jMock (a mock objects framework that complements JUnit), justifying automated tesing in financial terms, testing heresies, Yandex (the largest search engine in Russia, their share ~60%, Google’s ~6%), ‘Automated Testing: Why bother?’, automated tricks for manual testing, and the Perl-inspired Test Anything Protocol. I hadn’t seen lightning talks before, and I thought they were a fantastic idea - similar in some ways to the Straight 8 showings I wrote about a while ago - if you don’t like what you’re seeing, something else is coming along soon. More presentations should be like this.

My thanks to the guys at Google for hosting this conference, particularly for free. The original call for attendees was on their research blog. Some of the conference topics were quite academic and in-depth, but that provided a good constrast to the more practical topics. Google’s offices and facilities are also impressive - definitely worth visiting if you get the chance.

Google LTAC

Written by andrewferrier

September 9th, 2006 at 7:04 pm