Tuesday, September 17, 2013

A personal matter

Many of you already know this, but I wanted to make an 'official' statement on me going into part time retirement regarding deegree.

For personal reasons, I had to give up being self employed. Fortunately, I quickly found a new home at the WhereGroup, where I'll be working (apart from other things) on Mapbender.

Besides switching sides from server to client that also means I'll be switching from deegree to Mapbender. That doesn't necessarily mean that I'll never use or develop something for deegree ever again. But it for sure means that I won't make new 100+ commits pull requests any time soon.

Since I'll only be doing minor contributions from now on, it doesn't make any sense for me to remain a member of the TMC, so I'll step back from that position.

I sincerely wish things had worked out differently. With over 40 percent of the commits it almost feels like my third child. I wish you all the best of luck!

See you around (perhaps at the FOSSGIS code sprint in Essen?)!

Friday, June 7, 2013

Bolsena #4 - More JAXB fun

Adding to the JAXB fun I had yesterday, I've decided to fix related problems when binding our schemas at compile time, where the maven-jaxb2-plugin tried (sometimes) to load schemas from the net, although a XML catalog file was provided and configured.

The first thing I noticed was that only some of the related modules had that problem. Very strange, so I ran maven in debug mode (-X) for a module where it worked properly, and one module where it failed.

The plugin was configured identically in both cases (we manage plugins from our parent pom.xml). The only difference I've noticed was that in the working module all project dependencies were passed to the xjc call, whereas in the other module only the plugin's own dependencies were added. So obviously the schemas could not be loaded from the classpath.

I was not able to find out a reason for this, but just adding the project dependencies as plugin dependencies fortunately fixed the problem. This means that apart from the unit tests deegree can now also be compiled offline without problems.

If you want to know the details on xjc + catalogs, read this guide. It might be a little out of date, but most of it is still valid.

Thursday, June 6, 2013

Bolsena #3 - JAXB fun

In deegree we make heavy use of JAXB for unmarshalling configuration files. That works quite well, but had a drawback when making use of schema inclusion. The included schemas were always loaded from the internet.

Using the JAXB SchemaFactory I thought it was pretty easy to work around loading schemas from the net and using the ones included in our .jars instead. But somehow that didn't work out, the base schemas were still loaded through the internet.

Smart people found out that the order of the schema URLs when giving them to the SchemaFactory plays a role, and it turns out to be true! I've just opened a pull request that fixes the problem in deegree.

For it to work the schemas need to be in reverse order of inclusion, so put the schema without dependencies first and so on.

Bolsena #2 - coordinate system ramblings

Time for an update, even though there's not much to discuss on actual progress.

First for some good news, the resource dependencies pull request is finally here! It would be great if many of you would test it out, with 189 commits it's the biggest change since our move to GitHub yet, and although the unit and integration tests are working some details might still need some tuning.

Since my last post, I've been thinking and experimenting with the coordinate subsystem in deegree. The API is currently heavily based on the GML representation of coordinate systems, with references everywhere, for  axis, datums, ellipsoids and so on.

While I'm a friend of models that are 'complete', I'm not sure whether that's the best approach for coordinate systems. There are two major use cases for the CRS package. One is to keep the information on what system is being used with what identifier, the other is to transform coordinates from one system to another. Other use cases would be to import/export coordinate system definitions from/to GML, WKT, proj4 etc.

A review of the code revealed that all identifiers are stored in lower case. So exporting to GML or finding out exactly what identifiers exist for a given system is impossible, because the proper identifiers do not exist. The convenient use case of having eg. your layer configuration set up with the correct CRS identifier from the datastore also becomes impossible sometimes, in case the underlying data store is not configured explicitly with a CRS identifier.

So to summarize, the current package has several shortcomings. First is the identifier mess, which is not easily fixed short of a complete reimport of CRS definitions (which would override some manually modified definitions). Second is the model, which is just too complex and makes it hard to compare two definitions. Third, transformations are slow, sometimes not thread safe, sometimes they are synchronized and thus not scalable on multicore machines. Four, the whole system is statically initialized and makes heavy use of global static state.

I've tried unsuccessfully to fix some of these issues the past days, but I fear a complete rewrite is the only thing that will do the trick.

Monday, June 3, 2013

Bolsena 2013 #1

It's that time of year again, where OSGeo hackers from around the globe meet in a former monastery to collaborate and code under the Italian sun.

Things are a little different this year. Looking at the past years we've got a record number of people attending this time! Also, sadly, the Italian sun is missing. I hope that people are right when they tell me it's going to get better...

So back to business. I'm in the process of completing the resource dependencies branch/pull request, I can probably create the pull request today or tomorrow. Things are looking good, the web console is already adapted, no tests are failing and it works.

The Mapbender people have installed deegree on their computers and are working on integrating a simple workflow to create a new WMS/layer based on a shape file in a remote deegree instance using our REST API. That's already working, and we're currently trying to get it running in a more user friendly way. Not a bad start!

In a related note, I've created a new pull request that adds some more features to the REST API, like querying for all supported coordinate systems, checking if a coordinate system is supported by deegree and an experimental call to retrieve all known identifiers for a WKT encoded coordinate system.

In theory the equality relation is defined on coordinate systems (not taking identifiers into account obviously), but in practice I was not able to compare the Utah UTM zone (EPSG:26912) to a WKT encoded variant. I guess that's another reason why a rewrite/cleanup of the CRS package is needed.

Stay tuned for more!

Wednesday, May 22, 2013

Hans Moleman issues

XML, especially GML schema validation can be hard. The mysterious Xerces 'honor all schema locations' flag springs to mind (this is a mystery yet to be fully understood). Often, slow schema validation processes (which seem to fetch schemas from the web) can be traced to Hans Moleman. No, sorry, wrong link, to Hans Moleman.

So what's happening? And what does Hans Moleman have to do with it?

As the GML experts among you may know, GML application schemas depend on the GML schema, which in turn consists of many (varies amongst versions) schemas, depending on other schemas like for example the W3C XLinks schema, which in turn includes the W3C XML schema (the schema for the xml namespace itself: http://www.w3.org/XML/1998/namespace).

So even when validating a feature collection against a local version of a GML application schema, the schema parser might still get to a point where it needs to fetch dependent schemas from the internet. And since the xml.xsd is the last one in the chain, it's also the one that gets requested the most.

According to W3C people, they had ~130 million accesses to this file per day, and since decided to completely block eg. the Java default HTTP UserAgent and others. Apparently they later had a change of heart, and don't block it any more, but the xml.xsd URL has a delay of several seconds upon loading (see http://www.w3.org/2001/xml.xsd).

So when validating multiple documents, which all need the xml.xsd, with all schemas loaded freshly every time, you'll get a delay of several seconds where your computer seems to do nothing at all.

We've thought about the problem of remote schemas quite a while ago, and made use of a custom Xerces entity resolver to load OGC and W3C schemas from a local artifact which we ship with deegree. There would also be other solutions, our JAXB schema generation for example makes use of standard XML catalog files to avoid fetching schemas from the web.

But unfortunately the CITE WFS 1.0.0 tests (and others) do not (although newer versions tend to load required schemas from the classpath as well).

Using reverse engineering using an eclipse plugin (see the other post from today) I was able to fix this (they were already using a custom entity resolver, loading everything from the web all the time). Now a complete deegree build including integration tests runs only needs 13 minutes on fast machine!

For those interested, have a look at our deegree-compliance-tests module.

All the library sources

One of the nicer features in eclipse is that it is able to automatically browse not only through your own sources, but through library sources as well, if they're available. But what if they're not?

In that case eclipse shows the class in a byte code view, with a button to attach a source .jar. Unfortunately it  is often the case that you don't have the sources, either because they were not uploaded to maven central, or because they're closed altogether.

In any case, it is obviously often desirable while debugging to see what a library function actually expects, or why it fails. Contrary to popular belief, the actual code is always the ultimate documentation, because it's up to date even if human language docs are not.

So recently, while chasing after a Hans Moleman issue (more on that later), I needed sources from binary classes. Of course my first thought was 'decompiler', so I searched the eclipse marketplace for 'decompiler'.

I found JadClipse for eclipse 4.x, installed it, and voila: when I double click on a class file with no sources attached, the decompiler automatically decompiles it and shows me the source. Now that's what I call a plugin without hassle! That's a perfect counterpart to the -DdownloadSources flag for the maven eclipse mojo, now I never need to go without a library source again.

Wednesday, April 10, 2013

deegree developers to the front

Now that 3.2 has been released with our great new users handbook, I think it's time to start working on developer documentation as well.

We've got plenty of outdated development pages in the old wiki (which I'm not going to link here, it's too embarassing :-)). Instead of bringing the old docs up to speed, I've decided not to describe the status quo, but instead to describe things which are to come.

I've been working on a new implementation of the workspace and resource concept in deegree, which fixes a couple of outstanding issues with the old one. In particular, it adds resource level dependencies.

In case you want to have a look at the code, check out the resource-dependencies branch on GitHub.

Since we decided to have developer documentation close by the code, I've added the description of the new concepts to the GitHub wiki. Have a look at the workspace documentation which describes the new concepts.

I'm unsure whether the code will make it into 3.3 (it'll probably be released in around four weeks or so), but 3.4 is definitely the target.

I invite all interested developers to have a look at the code and the docs. Let's make the workspace right this time!

Wednesday, March 20, 2013

Setting up eclipse working sets using maven

Setting up eclipse with maven using a combination of the eclipse plugin for maven and the deegree maven plugin works pretty good. Unfortunately, we've got more than a hundred modules in the deegree code right now, which means the list gets really long in eclipse.

Of course, eclipse supports working sets, which can be used to group projects. But it's a lot of click work to organize all those projects into working sets, especially if you remove and re-add the projects often (eg. if you switch branches often).

Since I really dislike clicking often, I've decided to add a new mojo to the deegree maven plugin which saves me the trouble of configuring working sets and adding projects to them.

Since deegree module names follow a certain pattern (deegree-<modulegroup>-<name>, eg. deegree-featurestore-sql) it's a good guess to use the module groups as working set names. For example all deegree-core-* modules will be put into the d3-core working set, all deegree-featurestore-* modules into d3-featurestore.

To set up the working sets using the plugin, run the following line in the deegree root:

mvn org.deegree:deegree-maven-plugin:1.19-SNAPSHOT:setup-eclipse-working-sets -Declipse.workspace=$HOME/workspaces/test/

Use -Declipse.workspace to specify the workspace to modify, and optionally -Dworkingset-prefix to configure something other than d3- as a working set name prefix.

If you're generally interested in what the deegree maven plugin can do for you, check out the documentation.

Please note that this plugin is highly experimental, and has only been tested on the latest eclipse version. It might just destroy your eclipse workspace configuration! Use at your own risk!

Please also note that 1.19 of the deegree maven plugin is not yet released, you'll have to checkout my feature branch at GitHub and build it yourself (1.19 is probably going to be released on 2nd of April).

Edit:

1.19 has been released, you can use this out of the box now. Please note that it is indeed experimental, sometimes the working sets seem to be lost for some reason. In that case, re-doing the procedure works (for me).

Tuesday, January 29, 2013

deegree/Occam Labs in academia

Jürgen Weichand, author of the WFS 2.0 support for the Quantum GIS, recently published his master thesis (German only) on the topic of implementing and using INSPIRE DownloadServices.

While a large focus of the thesis lies on the client side (the WFS 2.0 plugin for Quantum GIS was developed in the context of the thesis), some server side solutions were also evaluated. Since UMN mapserver does not have a WFS 2.0 implementation, only GeoServer and deegree were considered OpenSource-wise, the GO Publisher WFS was chosen as closed-source example.

Looking at table 6.7 on page 67 it seems we're doing pretty well. I'm sure the upcoming 3.2 release will improve the situation even further. And it's one of the first publications explicitly mentioning us at Occam Labs as developers of deegree!

We probably could have made an even more prominent place in the thesis if we had a nicer web administration console ready (chapter 6.2.6 mentions this as the probable reason why GeoServer was chosen for an example implementation, simple features only!). At least with the deegree handbook nearing completion it will allow future thesis writers (and all others of course) to better understand how easy it is to set up a WFS serving rich features with deegree.

Edit: I did not mean to imply that GeoServer can't serve complex features (as Andrea pointed out it has been able to do that for years), it's just not as easy to do using the web interface (which Jürgen Weichand also pointed out in this thesis).

Tuesday, January 15, 2013

deegree goes GitHub

There's breaking news in the deegree world, as deegree (at least deegree as in the deegree webservices) is now at GitHub! Have a look at the official repository yourself. And help yourself to a fork...

Our main reason for moving to GitHub (besides moving to git as a tool) was to make it easier for people to contribute. And that's exactly what happened! We hadn't even figured out how to properly work with pull requests and the famous git cheap branches and there was our first pull request from an external contributor!

I also extracted the deegree maven plugin into a separate repository. Have a look at the documentation I added in its wiki.

We decided to (gradually) move all developer related documentation to the wiki of the deegree webservices code base, so that should be the primary source of information for now. Links to the old wiki were added where appropriate.

So I hope to see more of those pull requests coming!

In a related note, on Friday (the 18th of January) I'll move some more deegree related projects like the old deegree 2 code base to GitHub as well, and then the svn repository will be shut down/made read only. In case some old project picks up again, I offer to migrate any of the old projects to GitHub, but for building the unchanged code base, the read-only svn should be good enough.