Tuesday, November 13, 2012

RESTful API memo: PUT and POST differences

Before start designing a RESTful API, have a look at Hypertext Transfer Protocol -- HTTP/1.1, section 9

"The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line.".

In other terms, POST is meant to handle appends to existing resources or incremental creations of subordinate resources:

"The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI. The posted entity is subordinate to that URI in the same way that a file is subordinate to a directory containing it, a news article is subordinate to a newsgroup to which it is posted, or a record is subordinate to a database."

PUT instead seems is more appropriate to handle one-shot creations, creating or replacing an entire resource in one single transaction:

"The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI."

Differences between PUT and POST:

"The fundamental difference between the POST and PUT requests is reflected in the different meaning of the Request-URI. The URI in a POST request identifies the resource that will handle the enclosed entity. That resource might be a data-accepting process, a gateway to some other protocol, or a separate entity that accepts annotations. In contrast, the URI in a PUT request identifies the entity enclosed with the request -- the user agent knows what URI is intended and the server MUST NOT attempt to apply the request to some other resource."

Another remarkable difference is that PUT requests are required to be idempotent, while POST are not:

"Methods can also have the property of 'idempotence' in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE SHOULD NOT have side effects, and so are inherently idempotent."

Saturday, November 10, 2012

Creating your private Git repository on Dropbox in less than 5 minutes

Github is the tool I use daily to manage my public software projects, I love it. But sometimes I have to quickly and temporarily share private projects with colleagues or maybe even in a mixed environment, with customers and consultants from other companies. When there is no time / money to buy private remote repos from Github or even install a local Git repo on some server,  and for privacy constraints it is not possibile to publish the code on a public Github repo, then Dropbox comes to the rescue.

In this example I'm working on a simple Web application in Flask, which is a cool Python micro-framework. I created a "flask_sample" folder which contains the code I want to version with Git and share with other colleagues.

I promised it will take less than 5 minutes, so let's start.

Move to your Dropbox folder (in my case it's in /Users/mturatti/Dropbox/) and create a folder to host all your remote git repositories:

$ cd /Users/mturatti/Dropbox/
$ mkdir git

Then create here the folder to host this remote repository:

$ cd git
$ mkdir flask_sample.git
$ cd flask_sample.git

It's time to create a bare Git repository:

$ git init --bare

You'll see it creates a structure similar to the following:

mturatti:~/Dropbox/git/flask_sample.git$ ls -l
total 24
-rw-r--r--   1 mturatti  staff   23  9 Nov 18:38 HEAD
-rw-r--r--   1 mturatti  staff  112  9 Nov 18:38 config
-rw-r--r--   1 mturatti  staff   73  9 Nov 18:38 description
drwxr-xr-x  10 mturatti  staff  340  9 Nov 18:38 hooks
drwxr-xr-x   3 mturatti  staff  102  9 Nov 18:38 info
drwxr-xr-x  11 mturatti  staff  374  9 Nov 19:09 objects
drwxr-xr-x   4 mturatti  staff  136  9 Nov 18:38 refs

Now you have in place a git structure which can act as a shareable remote repository, even if in practice it's local to your hard disk. Being a Dropbox folder will do the magic in terms of backups, sharing and synchronization.

Initialize Git in your software project as usual (in my case the local project stays in /Users/mturatti/src/flask_sample)

$ git init

This creates the usual hidden .git folder.
The last configuration step is to add locally the previously created remote Git repository:

$ git remote add origin file:///Users/mturatti/Dropbox/git/flask_sample.git

Note we are using the file:// protocol for the remote Git repository here.
If you check the content of .git/config file you'll see the new origin (in bold below):

mturatti:~/src/flask_sample$ cat .git/config 

[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = false
[remote "origin"]
url = file:///Users/mturatti/Dropbox/git/flask_sample.git
fetch = +refs/heads/*:refs/remotes/origin/*

At this point you can start the usual Git lifecycle. For example, after you have added and committed all your files locally, you can "push to origin", which will push your code to your remote Git repository saved on Dropbox:

$ git push origin master

The last step will be to share the Dropbox folder with your colleagues, so that they can also add this as a remote repository and start cloning / pulling / pushing from this origin.





Saturday, September 29, 2012

Time, Cost, Quality and Agile Consulting

Types of Consulting Engagements

“Sometimes it's a little better to travel than to arrive” 
― Robert M. PirsigZen and the Art of Motorcycle Maintenance: An Inquiry Into Values
In my software consulting experience I have been engaged in many different kind of projects, but in all cases they fall into two main categories:

  1. Time and Material (T&M)
  2. Fixed Price (FP)
In most situations it happens than to decide that option #2 is achievable, then a quantity of T&M analysis must be performed in advance, in order to define the context and the scope for a possible, successive FP engagement. That's not always possible: for example, when the project is part of a public tender, you have to bid for the lowest possible price, trying to balance the need for adding a good amount of contingency, staying into the safe path, without self-sabotaging the possibility of winning the tender.

What marks the difference between T&M and fixed price? In T&M a customer is basically paying for your time, because deliverables and scope can't be clearly set in advance, or because it's already established that requirements are going to change in a way that a Fixed Price engagement is out of question, because it's too risky. A Fixed Price project is based on a set of much more stringent assumptions, in terms of context, requirements, functional and technical, which (hopefully) allows for a very accurate estimate of deliverables.

Usually customers are more keen on FP because, of course, they think it will constraint the final price by putting much more responsibilities on the consultant, while T&M seems a way to create a continuos stream of expenses. However, in reality, there is a more fundamental law which regulates any kind of software project, despite the rules of engagement, and it is related to the existing and unavoidable strong relationship among three distinct, fundamental quantities: Time, Cost and Quality.

The Basic Conjecture of Time, Cost and Quality

“When analytic thought, the knife, is applied to experience, something is always killed in the process.” 
― Robert M. PirsigZen and the Art of Motorcycle Maintenance: An Inquiry Into Values
It's almost incredible how many people actually think they can leverage Fixed Price to be in total control, at the same time, of these three quantities:
  1. The elapsed time spent from start to finish, so the final delivery date;
  2. The total cost, in terms of direct money and indirect materials;
  3. The overall quality of the final product.
This assumption has been historically proven false by practice, for any non-trivial software project or consulting engagement. It's a conjecture and not a theorem or a physical law, but reality has taught me that, in software development and software consulting, it is possible to accurately be in control of only two over three of these quantities.

Some examples are needed: if a project has a fixed delivery date and a fixed price, then the only left quantity one can possibly control is quality. Would it maybe explains why so many FP projects suffer from poor perceived quality?
On the other hand there is another fundamental speculation which states that "Nine women can't deliver a baby in one month". It means that, if delivery dates and quality are fixed, one is tempted to keep adding resources, in terms of people and infrastructure, loosing control about costs. In practice this tactic even leads to also increasing delivery time, because adding people on a late project usually delays it even more.
A third case is when we try to fix both cost and quality, but then we accept that elapsed time can't be predicted accurately. This is the case, for example, of companies trying to outsource development to offshore facilities, where cheaper labor force can easily be hired. Statistically this strategy has led many projects to both an indefinite development time but also poorer quality.

It's all about one single truth: software development is inherently not a traditional engineering activity. Actually, Programming is Gardening, not Engineering

Agile Development and Agile Consulting

"Simplicity is the ultimate sophistication". ~ Leonardo da Vinci.
So, at first sight, it seams there is no escape from the T, C & Q rule. But we are not doomed. In fact, as I wrote before, the rule applies for any non trivial software project. So the trick here is: to transform big projects or big consulting engagements into a finite sequence of very focused, well defined, little activities or mini-engagements. This is why time-boxing or feature-boxing usually work effectively, and that's what actually Agile Methodologies are, more or less, trying to achieve: transforming complexity into something more manageable and predictable, by splitting big activities into little, possibly trivial, short tasks, which can be handled in few hours or days by very few people.

I think that agile methodologies can and should be successfully applied also to pure software consulting, so to the kind of engagement usually performed in T&M. The main pillars of this strategy are nothing new and can be summarized as:
  • Focus on User Stories;
  • Short iterations, usually no more than two or three weeks long;
  • Continuos Integration and Continuos Delivery of valuable pieces of software;
  • Acceptance tests at the end of each iteration or when a single deliverable is ready.

As a side note: if you are strictly required to be on-site then that is not a Fixed Price project by definition! Fixed Price engagements MUST be off-site, exactly because you don't want to waste time renegotiating the scope each single moment. It is necessary to have customers involved daily and keep things very flexible, but asynchronous interruptions must be avoided at all costs. The main objective is to understand what final users want and adapt when requirements are changing, but this must be addressed by the process, not individuals. T&M is very different: as customer pays for your time he is entirely entitled to interrupt you and change tasks even in the middle of them. That's why committing on any detailed deliverable in a T&M assignment is extremely dangerous.

The Need for Good Architectural Decisions

"We are searching for some kind of harmony between two intangibles: a form which we have not yet designed and a context which we cannot properly describe." ~ Christopher Alexander.
The missing piece, for a Use Case made of several User Stories, is a comprehensive and reasonably complete Technical Architecture. In other words, I do believe in that kind of bottom up, emerging software design coming from an Agile, iterative process, but I think this must be developed within the frame of a clear up-front Architecture.

I mean that refactoring code and design is not only necessary, but even desirable. Then, in my experience, refactoring wrong initial architectural decisions can be extremely expensive and usually leads to big failures. I strongly believe that the role of an experienced architect is key to produce quality software systems, and this fact sounds to me to be often too underestimated in the field of Agile Methodologies. Do not fool yourself by believing your so-called "rockstar developers" (horrible term!) alone can also actually imagine, design and implement a complete and working technical architecture.

Speaking of consulting, even a short T&M engagement should be performed in the context of a well designed architecture, because even the best expert can be unable to deliver anything useful if the architectural context is broken. Focus your first steps at customer site on two main things: understanding their existing architecture and development process, and start fixing them if they are clearly broken. Otherwise the risk of failing and not get paid will be too high, despite the fact it's T&M or FP.

Tuesday, April 10, 2012

Managing Multiple JDK on Mac OS X

Recently I had to install the OpenJDK 7 on my Apple MacBook, but keeping the original JDK 6 as my main Java environment. After browsing the Internet I came to a decent set of instructions (Tested with OS X Lion 10.7.3).

In summary:
  1. Get the OpenJDK from http://code.google.com/p/openjdk-osx-build/ (I chose JDK 7u4, which at present seems to be the latest stable build); The JDK 7 is now a regular download by Oracle: http://www.oracle.com/technetwork/java/javase/downloads/index.html
  2. Install the downloaded .dmg package;
  3. Change your Java Preferences accordingly, by moving on top the "OpenJDK 7" item (by default Java SE 6 is the first item - see below picture);
  4. Automatically set the JAVA_HOME variable, so that shell tools work.
To automatically setup the JAVA_HOME variable it's necessary to add few lines to the .profile.
So, edit this file (it's in your home directory) adding the following lines:

# Change your JAVA_HOME
function setjdk() {
   if [ $# -ne 0 ];
      then export JAVA_HOME=`/usr/libexec/java_home -v $@`;
   fi;
   java -version;
}
# Automatically set the JAVA_HOME
export JAVA_HOME=`/usr/libexec/java_home` 
echo 'JAVA_HOME='$JAVA_HOME

Note that the optional setjdk function allows for dynamically changing the JAVA_HOME if you switch items in your "Java Preferences", otherwise many Java tools won't work if JAVA_HOME is not in synch with the System settings.

Whenever you change the default JDK using the "Java Preferences" tool, then any new terminal will automatically pick-up the new JAVA_HOME by executing /usr/libexec/java_home, so executing the setjdk function is not usually necessary, unless you really don't want to close and re-open the terminal (opening the terminal reloads the .profile). Alternatively issue the command source .profile in you shell.

That's it, now you can install multiple JDKs and select them dynamically, by just using the "Java Preferences" tool, without touching any system file by hand.


Saturday, April 7, 2012

XML Schema and WSDL modules for Netbeans 7.x

A couple of years ago I wrote a blog entry about how to install the missing XML Schema Editor and related utilities from the dev update center in Netbeans 6.9. Now there is a unofficial update center:

http://deadlock.netbeans.org/hudson/job/xml/lastSuccessfulBuild/artifact/build/updates/updates.xml

It contains the development branches of these and instructions on how to install, thanks to Geertjan Wielenga.

I installed the plugin on Netbeans 7.1 [update: I installed it also in 7.2] and it seems to work, even if I did not test it intensively, primarily because these days I'm no more working that much with XML Schemas and WSDL files.

There is an apparently disabled Hudson project for the XML Tools. Now, if you want to put this nice plugin back into the regular plugin repository, please vote for this issue!

Thursday, November 24, 2011

Starting with CMIS and Maven

This post aims to be an short how-to for setting up a CMIS development environment based on Maven and Apache Chemistry, specifically the OpenCMIS Java API, part of the Chemistry project.

I won't cover Maven installation and configuration here, so I assume you have Maven 2 or 3 up and running. With Maven you'll be independent from any specific IDE, so that you can manage your development cycle from the command line only.

Glossary
  • CMIS (Content Management Interoperability Services) =>"is a specification for improving interoperability between Enterprise Content Management systems. OASIS, a web standards consortium, approved CMIS as an OASIS Specification on May 1, 2010. CMIS provides a common data model covering typed files, folders with generic properties that can be set or read. In addition there may be an access control system, and a checkout and version control facility, and the ability to define generic relations. There is a set of generic services for modifying and querying the data model, and several protocol bindings for these services, including SOAP and Representational State Transfer (REST), using the Atom convention. The model is based on common architectures of document management systems."
  • Apache Chemistry => "Apache Chemistry provides open source implementations of the Content Management Interoperability Services (CMIS) specification.
  • OpenCMIS => "Apache Chemistry OpenCMIS is a collection of Java libraries, frameworks and tools around the CMIS specification. The goal of OpenCMIS is to make CMIS simple for Java client and server developers. It hides the binding details and provides APIs and SPIs on different abstraction levels. It also includes test tools for content repository developers and client application developers."
  • Apache Maven => "Apache Maven is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information."
Ingredients
  1. A simple text editor or any decent Java IDE 
  2. Maven 2 or 3
  3. A CMIS server for real-world testing
In my case I'm using IntelliJ IDEA, which is excellent. I'm an old Netbeans guy and both IDEs offer superior Maven integration, but it happens that I'm just having a look at IntelliJ these days.

To cover point # 3 I have selected the reference CMIS server implementation so far, which is Alfresco.  OpenCMIS offers a basic CMIS server implementation for your self-contained unit tests, but for end-to-end integration testing I prefer to link to a real ECM system.

You can download the latest Alfresco Community Edition for free from here. At present the brand new 4.0 is available.

Setup

Note: I won't cover Alfresco's installation and configuration here because it's not in the scope of this post. You can already find plenty of excellent online resources for that.

Just open a shell, place into a folder and run the following Maven command, to create a very basic Java project through the quickstart archetype:

mvn archetype:generate -DgroupId=com.myapps \
                       -DartifactId=my-first-cmis \
                       -Dversion=1.0-SNAPSHOT \
                       -DarchetypeArtifactId=maven-archetype-quickstart \
                       -DinteractiveMode=false

You'll end having the following usual project structure:

project
|-- pom.xml
`-- src
    |-- main
    |   `-- java
    |       `-- App.java
    `-- test
        `-- java
            `-- AppTest.java

The pom.xml file is the center of the Maven's universe. We need to edit it for adding a few lines of XML so that we can build with OpenCMIS libraries.
Here is the default pom.xml created by the archetype:

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.myapps</groupId>
  <artifactId>my-first-cmis</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>my-first-cmis</name>
  <url>http://maven.apache.org</url>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Now we need to put the following XML snippet into pom.xml to activate the OpenCMIS libraries:

<dependency>
   <groupId>org.apache.chemistry.opencmis</groupId>
   <artifactId>chemistry-opencmis-client-impl</artifactId>
   <version>0.6.0</version>
</dependency>

At present the latest stable OpenCMIS release is 0.6.0, you can modify the pom.xml file accordingly whenever a new version is released.

This is the final POM file:
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.myapps</groupId>
  <artifactId>my-first-cmis</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>my-first-cmis</name>
  <url>http://maven.apache.org</url>
  <dependencies>
    <dependency>
      <groupId>org.apache.chemistry.opencmis</groupId>
      <artifactId>chemistry-opencmis-client-impl</artifactId>
      <version>0.6.0</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Now that your development environment is ready and you can build both from the shell and the IDE, you can start exploring some examples.

Issuing a mvn clean compile command in your shell will start the process.
If it's the first time you run Maven then it will try to download many dependencies, but don't worry and be patient, all successive runs will be very fast.


Friday, March 4, 2011

Purge Alfresco archived nodes

I was looking for a way to automatically purge the Alfresco trashcan and, after a while I think I came out to what looks like a decent solution.

DISCLAIMER: The procedure described in this article has not been tested intensively and comes without any implied warranty of fitness for a particular purpose. You should check the code, test it and decide yourself if fits your needs, saving all your data before any experiment.

The problem
After some time, deleting contents can fill the Alfresco's trashcan and removing nodes manually with the UI can be unpractical (users always forget about this). Alfresco does not actually delete content, but moves deleted nodes into the archive store, which is like a trashcan. Deleted contents can stay there forever, until users decide to clean-up the trashcan. In a big repository this could lead to a huge waste of resources.

I need a service I can invoke programmatically to empty the trashcan, for example by scheduling a task with an external job. I don't like to deploy into Alfresco a scheduled task controlled by the embedded Quartz, I think it's cleaner to move the scheduling outside and deploy into Alfresco always the bare minimum.

Even after the trashcan has been emptied, this just means nodes are only marked as "orphans", moved into alf_data/contentstore.deleted and can be phisically removed by a contentStoreCleaner asynchronous task. So there is a safety net in Alfresco to avoid at all costs accidental deletions.


Cleaning-up archived nodes
I have developed a simple Java-backed Web Script for Alfresco 3.4 (It should work with Alfresco 3.2+) which can be invoked to clean-up the archived nodes. Below its major components:

purge.get.desc.xml
Web Script descriptor

    Purge all
    Purge all archived nodes
    /purge
    user
    none

purge.get.html.ftl
Freemarker template

purge-context.xml
Spring bean's configuration

I created it/alfresco/utils folders under /Company Home/Data Dictionary/Web Scripts Extensions where I created both purge.get.desc.xml and purge.get.html.ftl.

The Spring context file purge-context.xml goes under /tomcat/shared/classes/alfresco/extension in the main alfresco installation folder.

Our bean makes use of nodeArchiveService.

Here's the Java Code:


The Purge project under Netbeans 6.9.1


The bean is injected with nodeArchiveService and calls method purgeAllArchivedNodes.

The single most important line of code is:
this.nodeArchiveService.purgeAllArchivedNodes(StoreRef.STORE_REF_WORKSPACE_SPACESSTORE);

We are passing the STORE_REF_WORKSPACE_SPACESSTORE constant, which is "the store that the items originally came from", as per JavaDocs:

purgeAllArchivedNodes

void purgeAllArchivedNodes(org.alfresco.service.cmr.repository.StoreRef originalStoreRef)
Permanently delete all archived nodes.
Parameters:
originalStoreRef - the store that the items originally came from
Calling the WebScript
After starting Alfresco, to get a list of available Web Scripts and check if this new one has been installed correctly, point the browser to http://localhost:8080/alfresco/service/index and then press link "Browse all Web Scripts". Remember to authenticate as admin, so that the Web Script can be ran with administrator privileges.

The "Purge" Web Script should be the first one

To invoke its execution and clean-up the trashcan you can call:
http://localhost:8080/alfresco/service/purge

If everything went fine you should see the following response page:
Alfresco Community Edition v3.4.0 (c 3335) :
Purged all archived nodes. Elapsed time: 438 ms.
Then verify all users' trashcans are now empty:


As we now have our RESTful purge Web Script in place, it's easy to call it from an external script, maybe scheduled via a cron job for a periodical clean-up. In alternative it's possible to use the Quartz engine embedded into Alfresco, but my personal preference is to avoid putting into Alfresco too many responsibilities: if you need to change the scheduling it's easier for maintenance to have an external scheduler.