Wiki
URIs Versus Attachments

Firstly, some terminology:

If these definitions strike you as somewhat recursive, you’re right, they are. But that’s how they are actually defined and used!

Benefits

There are a number of significant benefits of using URIs to exchange resources external to messages (e.g. files, quotes, reference material, etc.):

  1. They are universally understood, both by human readers of the message traffic, and by the machines at either end.
  2. Retrieving a resource is as simple as performing a HTTP GET on the resource. Contrast this with the relative complexity (and often incompatibility) of the various Soap With Attachments implementations on different platforms.
  3. The message size itself is kept very small—with attachments the bulk of the message will be a base 64 encoding of the resource, which potentially could be very large (e.g. a 500MB software dump).
  4. The receiver can retrieve the resource when it’s needed, and does not have to swallow the resource immediately.

Using URIs could potentially mean that it is not necessary to use Soap With Attachments at all. That is, that any resouce being transferred between client and service provider would be done with the client passing a URI to that resource in the web service message, and the service provider retrieving the resource at that URI.

Drawbacks

However, it’s not all sweetness and light. There are a few potential downsides of using URIs to the exclusion of attachments.

Security

When you are providing a URI to a resouce that is confidential (e.g. your TM), you obviously need security around it. One mechanism is that the security could be the standard security applied by the server (e.g. HTTP server) providing the resource. The issue with this is that the security is now outside the realm of the web service. I know that this is what we’ve been trying to do, but we need to tread carefully.

Availability

Does the resource have to be permanently available? In this scheme, ideally yes.

For example, in the case of the associateResource operation, the client will associate a resouce (e.g. a TM) to a job. Does the service provider have to retrieve that resource immediately, or can they retrieve it on demand (e.g. when the analysis phase of the project requires the TM)?

Ideally, you want a resource that is always available, and accessible from the client to the service provider. However, this approach throws up a number of additional issues:

  1. The client is obliged to maintain the resource at that location 24×7, and failure to do so could mean that the service provider can’t continue working (e.g. if the TM cannot be retrieved when its needed).
  2. There are the obvious security risks of having a confidential resource permanently available, even if there is some security mechanism protecting it.

In addition, we would have to condsider what happens if the resource is not available. Assuming the service provider’s work is impacted, how does the service provider signal to the client that the resource is not available, and that they cannot continue?

Permanence

Another, similar issue is that of permanence – that is, how do you accommodate the fact that the resource may change during the lifetime of the job?

If the resource changes on the client side during the job, the service provider will always retrieve the latest version. This may not be what is required – the client may want the TM frozen in the state it was in when the job was initiated.

If this is the case, the client is obliged to maintain different versions of the same resource at different revision levels, adding to their logistical requirements.

Update and Exchange

Another issue that is worth considering is that of updating a resource – e.g. a TM. Consider the following scenario:

  1. The client initiates the job
  2. The client associates the TM resouce to the job via a URI
  3. The service provider translates the job, and in the course of that, updates the TM.
  4. How can the client retrieve the updated TM? The original URI points to the pre-translation TM.

One approach would be that when the client associates the TM resource to the job, the service provider retrieves that resource, and returns a new URI pointing to where the resource can now be located on their side. When the service provider updates the TM, it is the resource pointed at by the new URI. Hence, the client can retrieve the latest and greatest copy of their TM at any time.

Another option would be for the client to make their TM resource updatable directly by the service provider, using HTTP POST. That is, the service provider could POST a new copy of the TM whenever they had updated it. However, this approach leads to real problems of concurrency, locking, etc. when a client has to deal with more than one service provider, so is probably not worth seriously considering.

Workarounds

One possible workaround to some of the above issues would be to apply a flag to the resource to tell the receiver whether the resource is permanent or not. If so, the receiver could retrieve it on-demand, otherwise, they would have to retrieve it immediately, and store it locally.

<xsd:element name="resource">
  <xsd:complexType>
    <xsd:attribute name="uri" type="anyURI" use="required"/>
    <xsd:attribute name="permanent" type="boolean" use="required"/>
  </xsd:complexType>
</xsd:element>

In a message, this would look like:

<associateResourceRequest xmlns="urn:oasis:names:tc:wstrans:v1:types">
  <ticket jobId="job-001" projectId="proj-002" userId="user-003"/>
  <resource uri="http://www.simship.com/resource1.pdf" permanent="0"/>
</associateResourceRequest>

This means that the resource is not permanent, and that the receiver (the service provider in this case) must retrieve the resource immediately, and store it locally.