Talk:Delta update pattern
From Xcri
Alan Paull said:
[edit] recDateTime
The recStatus attribute could be used in XCRI-CAP as a potential support to delta harvesting (comparison of elements and their contents). If delta harvesting is used, then recStatus may be sufficient, and XCRI feeds could contain successive update files (full data sets or partial data sets). The publisher would have to ensure that sequential update files were published. The aggregator would have to process each update in sequence. This means that the responsibility for data maintenance would be shared between the publisher and the aggregator with a fairly tight coupling of processes. However, there may not be much appetite this type of delta matching.
It may be worth noting that in the Open Archives Protocol for Metadata Harvesting (OAI-PMH), the model for delta harvesting upon which the XCRI-CAP design is based, a unique identifier and dateStamp are normally required items, while a status attribute is optional. We may wish to consider further exploration of the OAP-PMH route.
An alternative or addition to recStatus is to use a 'date of last update' on the most significant elements to support harvesting. This method means that the XCRI publisher can provide a single up-to-date whole data set, and the processing of the data is entirely the responsibility of the aggregator. Wherever new or updated data has been provided, the aggregator simply adds or replaces the existing data.
It is suggested that a further attribute to support this type of data maintenance be introduced: recDateTime. This attribute would consist of a standard date or dateTime indicating the latest update date or date and time of the element. See: http://www.xcri.org/forum/topic.php?id=47.
It is envisaged that an aggregator would examine a recDateTime attribute and compare it with currently held data. Seven cases occur, as follows, with Case 4 and Case 5 representing alternative ways of indicating deleted elements.
Case 1: New element does not exist in currently held data. Aggregator adds element to currently held data.
Case 2: New recDateTime is after the currently held recDateTime (greater than). The data is therefore new, and the aggregator replaces the current element with the new element.
Case 3: New recDateTime is the same as the currently held recDateTime (equal). The data has already been received and the aggregator makes no change to the element.
Case 4: New recDateTime is before the currently held recDateTime (less than). ERROR. Previously notified data has been deleted from the remote system or local system error; intervention from an operator is required.
Case 5: An element that exists in the currently held data does not exist in the new data set. Element deleted. [This is dangerous, as the new data set might be incomplete; in which case, treat this as an error. It may be preferred that case 6 is used for deleted elements.]
Case 6: New element is empty or null. If recStatus = "3", aggregator deletes element, else treat as ERROR.
Case 7: recDateTime missing. ERROR. Intervention from an operator is required.

