Feature Proposal: Fix brain-dead delete

Motivation

It has always been a pain in the ass that TWiki supported "delete" with a move to the Trash web. I guess this implementation was expedient for the RCS store, because there appeared to be no simple way to maintain the revision history of a deleted topic. I imagine the vision was of the Trash web being a sort of recycle bin. However it creates all sorts of problems, for example:
  1. Have to delete attachments by renaming them to TrashAttachment, which gets crowded and dirty
  2. Loss of context; no way to tell where a deleted web comes from
It is long past due time for a redesign of this "feature". I'm not saying "get rid of Trash web", I'm saying let's review the requirements and see what we can do to make it all work a lot better.

Description and Documentation

Usual interpretations of must, should and could. The proposal is to re-implement the "delete" function subject to the following requirements (thank you Microsoft for the work you put into designing this paradigm):
  1. Delete web, topic or attachment
  2. Two delete modes: "Move-to-trash" and "Wipeout"
  3. Move-to-trash mode:
    1. Must be no loss of history (including where things came from)
    2. Must list deleted webs, topics and attachments
      • List should show who deleted and when
    3. Must be able to restore deleted webs, topics and attachment from trash
      • Must be by admins at least. Could also be by anyone with write access on destination
      • Could restore to another place than the original one
    4. Must be subject to CHANGE access being available on the container object and recursively on the deleted object and its contents
      • would the creation of a new item with the same name result in the 'restoring' of the past history, and then the adding of the new item to the HEAD? -- SD
      • No smile -- C.
    5. Should keep the read access control: read-protected documents should not become public once in the trash
      • (tricky if the protection was inherited by web settings)
  4. Wipeout mode
    1. Must wipe out the deleted object
    2. Must be subject to DELETE access being available on the container object and recursively on the deleted object and its contents
    3. Wiped objects must not be restorable
    4. Wiped object could be archived in longer term storage. Cron job?
  5. "Empty trash" function
    1. Must be subject to DELETE access
    2. Must wipeout the contents of the trash cache
  6. Foswiki::Func API to all this functionality
WIBNIF it can be triggered via a cronjob in a "destroy all items deleted more than N days ago" mode -CN
  • cronjob should be able to archive trash for longer term storage.
DELETE access would default to an admin group, but clearly could be delegated as appropriate.
  • careful with read-protected documents. Maybe we should force shift-delete of them (the admin will have backups of the site anyways) -CN
This is a tremendous self-contained, but far reaching, coding project for someone.

Implementation

Implementation options (on the store part) range from using a nested mirror of the existing web structure in the TrashWeb, a context file that maps the existing trash mechanism, to leaving the files where they are, but adding a 'DELETED' attribute that the store would look for before listing the item.

(so basically, replicating the different ways that DELETE with history is implemented on cvs, and then adding it to the Store functions.

CrawfordCurrie favours a systematic approach; starting with putting the sort of API used by WebDAV+Versioning over the top of the existing store. I have been working on such an API.

-- Contributors: CrawfordCurrie, SvenDowideit, ColasNahaboo, GeorgeClark - 07 Dec 2008

Discussion

One thing that I've set up on a couple of sites is a background task that weekly archives the Trash web and retains it for a couple of years. We've never had to go back and retrieve archived, trash, but the capability is there if we need it. I've added an archive option above. -- GeorgeClark - 07 Dec 2008 - 17:19

I would help out creating the user interface for move to trash / restore from trash. -- CarloSchulz - 09 Dec 2008 - 08:57


The following was moved from MakeDeletionSmarter, a duplicate proposal

Feature Proposal: Web, topic and attachment deletion is poor; make it better

Motivation

The current system makes it really hard to recover anything, because the histories are thrown away, deleted stuff can overwrite, and the rules for conflict resolution are unclear and impossible to reverse.

Description and Documentation

I have been thinking about how we delete topics (copy to trash). It would be good if we had (1) and API and (2) a better methodology that would allow recovery of trashed topics

Here's what I have in mind:
  1. when deleting a web, prepend D$year$mo$dayT$hour$min$sec to the web name before moving to Trash
  2. when deleting a topic, prepend D$year$mo$dayT$hour$min$sec and the web name (with s#[./]#_#g)
  3. when deleting an attachment, move it to the topic named as in (2)
  4. if any of the above would try to copy over an existing topic in trash, append sequential numbers until there is no conflict
I think these rules allow you to reconstruct everything except the version of the topic/attachment that was deleted. Is that important?

The API would be at least "moveToTrash" for webs, topics and attachments; but it could also include "restore"of a web/topic/attachment by reversing the encoding in the name.

Examples

Impact

%WHATDOESITAFFECT%
edit

Implementation

-- Contributors: CrawfordCurrie - 29 Jul 2009

Discussion

Another problem: the topic or the attachment can not be restored into the original location because it got occupied by a new object of the same name. So the user might want to decide to copy over / create a new revision on top in that case. Maybe the user better should have been notified before that he is about to create a new topic of a certain name that existed in the past already. He better should think about creating a new revision instead of starting a new history from scratch ... and so on.

So maybe the moving to a trash web is not a timely approach anymore? What about marking it as deleted instead and leave it where it is until the user decides to clean up.

Oh, we don't have an Empty Trash Bin API either.

Sorry, it was you who opened the can of worms wink

-- MichaelDaum - 01 Aug 2009

further to Micha's suggestion to mark as deleted and leave in place - we already use an _ to denote a non-seeable web, why not use it for topic and attachment deletion?

(um, in the text file based store i mean)

-- SvenDowideit - 01 Aug 2009

My work foswiki has attachments that start with an underscore. I would not like those files to appear to have been deleted one day when I upgrade.

Why not take a leaf from CVS' book? Move deleted topics to an (invisible) sub-web named (for example) _deleted that is created on demand. Attachments could be handled similarly: for the file-based store, they could be moved to a new directory (e.g. foswiki/pub/Web/TheTopic/_deleted), also created on demand.

Alternatively, for an RCS-based store, delete the file for the topic or attachment and leave its ,v file. If a topic is deleted, then that topic's pub directory could be renamed to mark it as belonging to a deleted topic (e.g. prepend "." or append ".deleted).

-- MichaelTempest - 01 Aug 2009

@Sven: Afaik the semantics of the underscore prefix in webnames was: "this is a template web" and not "this web is hidden". Am I wrong?

-- OliverKrueger - 01 Aug 2009

@Oliver, yup, you're not quite right smile underscore prefix is a 'convention' that was said to be saying that "this is a template web", but its never actually done anything other than hide it. when you goto ManagingWebs, you can use any web as a templateto create a new web...

Michael's reason for not doing it is true thou, so my idea isn't useful smile

-- SvenDowideit - 01 Aug 2009

Thanks, all. I was well aware what sort of worms I was digging up by raising this. I was hoping to be able to get away with a variant of the "move to trash" approach, as it seems to work reasonably well for Windows. When the scenario Micha describes is encountered, Windows prompts "The folder already contains a file named...." which is pretty much identical to what should happen when you copy to an existing name. Unfortunately that fails in any environment where histories are maintained.

A long time ago Sven proposed that file deletion should be a process of file elision i.e. hiding without physically deleting, and Michael has re-proposed this above. This would address the problem of a new history being created over the top of an old one, but also creates a lot of knock-on effects throughout the code. However biting this bullet would, IMHO, improve the usability of Foswiki significantly, so is worth doing.

We are talking about a significant amount of redesign here. The Store, Func and Meta APIs have to be re-engineered, which may cause significant issues for backward compatibility. The UI screens relating to topic deletion have to be redesigned, and the documentation rewritten.

On the plus side, many APIs and UIs can remain unchanged, because an attempt to modify an elided object can be treated as an access violation.

-- CrawfordCurrie - 02 Aug 2009

It is important that whatever we do does not complicate the maintenance of the installation for the admin.

When people delete a topic today the deleted topic ends up in Trash web.

The admin can maybe once per year - with his head under the arm - delete all the topics in the trash web where the file is more than 2 months old. That is what I do.

If we create some scheme where the deleted topics remain in the web under some elision (not sure how this would be implemented) I fear pruning them becomes rather complicated. Moving to a trash like we know from Windows is logical and emptying the trash is logical and natural.

The issue the original proposal here addresses is valid and I kind of like the proposal. The important requirements:

  • The user must be able to delete a topic or attachment even if a topic or attachment already exists in the Trash web. Either by auto-renaming or forcing a rename. I prefer an autorename.
  • The user or admin must be able to restore a deleted topic or delete web.
  • If someone wants to restore a deleted topic, and a new topic already exists with the same name, the new topic must not be overwritten. The user should be prompted to give the restored topic a new name - or alternatively (and less user friendly) be notified that the restored topic has been renamed by some suffix added.
  • Topics and attachments being deleted and restored must retain their history.
I cannot see any real need to be able to merge topic histories when restoring. I doubt the gain justifies the pain for this special case.

And as already pointed out the admin really needs an "empty trash" user interface. And I would suggest such an empty trash to have an "deleted older than..." selection. If done properly such script could then also be run as cron job.

-- KennethLavrsen - 02 Aug 2009

Good point, and that's one reason I'm thinking the elision approach is better. If we follow the first proposal, imagine the scenario where a lot of topics have been deleted from various webs. They are all renamed using some (to the user) impenetrable algorithm, and mixed up together in the Trash web. Automation can help, but at the end of the day is no different to the automation that can help with elision.

Note that separation can still be achieved in the current store implementation. For example, elided topics and subwebs could be stored in a .elisions subdirectory. This would only need to be checked when (1) a topic is created and (2) elisions are specifically asked for. Manual maintenance of same is as simple as find . -name .elisions. Elided topic and subweb names would be unchanged from their original names. Pub deletions would parallel this.

With regard to pub deletions, this may be the right time to get rid of the duplication of the FILEATTACHMENT meta-data, which has been nothing but trouble since it was invented. Otherwise elision of attachments is going to have to modify topics frown, sad smile
  • Hm, but attachments like forms are no independent entities. They are always a sub-object of a topic so to say. Adding or even renaming them could indeed be seen as a topic modification and cause a new history entry. Note also, that SEARCH will find topics that have a hit in an attachment name or comment. - MD
-- CrawfordCurrie - 02 Aug 2009

Yea well.

Let's outline the area we are touching here by taking a step back and forget about Foswiki for a minute to come back to it from a bit different angle.

First some words on MS Windows: "move to trash" is only an UI metaphor in many OSes nowadays where the deleted material stays in place as long as "empty trash" is activated. For instance deleting a large picture from an usb stick would preferably not result in it being downloaded to the trash location somewhere on a different store. Instead, it remains where it is and will be deleted finally when the user says so. From then on restoring it becomes impossible. That's the feature.

Now version control comes into play. The strict view is: every modification is creating a new revision. Deletion of the document itself is also a document modification, as well as restoring it or modifying its ID if allowed.

The notion of document IDs (or URIs) needs a closer look. These are webtopic names associated with wiki words on Foswiki. Changing a document's wiki word is creating a new history entry as well. This also entails a change of the document ID as wiki words are used as document IDs on Foswiki.

In a model using a strict view on revision control, restoring a deleted topic will become restoring a previous revision of it. Also, restoring a previous revision might revert a rename action. In Foswiki renaming a wiki word potentially entails a large amount of changes all over the wiki, even depending on the way the rename was performed in the attempt not to break wiki words. There are at least three ways to rename a topic depending on the settings the user selected during that operation.

So juggling with topic locations, their wiki word and as a consequence the ID they occupied in the webtopic namespace is very problematic while holding up the strict view on revision control at the same time.

Now webs and attachments. Webs do have an ID similar to topics. The namespace for attachment IDs is the one topic they are uploaded to. We haven't yet formalized unique attachment IDs in a global namespace using some webtopic prefix (there's been some thinking about it in diverse plugins and in the TopicObjectModel). Moving an attachment out of the namespace where it was created will obviously cause any sort of problems that this proposal tries to overcome using an intelligent naming scheme.

Moving/renaming/deleting a complete web is not covered by RCS at all. So restoring it while another one of the same name has been created in the meantime is even more of a problem.

As long as we don't dealt with object IDs (web, topic and attachment names) in a more formal way, moving them around will always be a problem. Therefore building a "move to trash" on top is a risk.

From a more practical pov and with some short term release plan in the back of my mind, we can't do the step towards a proper revision control of all objects from where we are at the moment. So the original proposal is okay to do the best to mitigate the problem for the next release.

A more thorough solution fits well into the TOM plan of further releases, AFAICS.

-- MichaelDaum - 02 Aug 2009

There has always been a problem that renaming a topic doesn't leave an audit trail behind it, so links to that topic will always be broken. If a renamed topic also left an entry in the web it came from, then those links could continue to be satisfied by redirection to the new topic.

Just for illustration, let's say we used the .elisions subdirectory approach.
  • A topic rename would create an entry (of some kind) in the .elisions directory in the source web, indicating where the topic had been moved to. This would allow broken references to be redirected.
  • A topic delete would simply move the topic (and all attachments) to .elisions
  • It would no longer be essential to rename links to the renamed topic; they could automatically be satisfied by the entry in .elisions; though I suspect we may want to continue this practice.
  • If .elisions entries are themselves revision controlled, then a full audit trail persists. Importantly, a link to a specific revision of a topic can continue to be satisfied by reference to the elision record.
Thus, renaming version 5 of a topic consists of:
  1. Creating an elision record, by moving the topic file to .elisions
  2. Creating version1 of the newly named topic, and
    1. Populating it with the content of the original topic
    2. Backlinking it to the source elision record
Thus sufficient information is retained to undo the operation, and external links to "version 5" of the topic can be satisfied via the elision record.

Deleting a topic consists of:
  1. Creating an elision record, by moving the topic file to .elisions
-- CrawfordCurrie - 02 Aug 2009

I was more thinking of: create a revision 6 record that includes all information that resulted in the revision 5 to be renamed, by extending the meta data. This meta data would allow to revert the operation to go back from 6 to 5, including renaming and moving. That's what I was targeting at in my previous comment. This is a more general description of what a storage backend would implement.

An RCS and DB store could implement the renamings and elisions quite differently. A filesystem based store would use a filename schema so that you don't have to open up the file to identify an elision. An SQL DB would not need to encode elisions into the file/topic name and use some meta data column instead.

-- MichaelDaum - 02 Aug 2009

I was somewhat reluctant to illustrate how it might be implemented because, as you point out, different back-ends will implement it differently. However I felt it was important to illustrate to Kenneth how this might be done, so he can see how it is no more complex than the original proposal.

The place we really need to focus is on the requirements on the store, irrespective of the implementation. At a high level these can be summarised as:
  1. Support topic, attachment and web histories through delete and rename operations
Note that this does not imply an undo, just the history.

An interesting problem is the representation of an rev of a web. Obviously a rev of a web identifies a set of topic and subwebs it contains, and the specific revs of those topics and subwebs. At the moment the only way to recreate such a rev is from a timestamp, and to then iterate over all the contents of all webs to identify, from the histories encoded in meta-data, those topics that participated in the web at that time. Clearly this is unacceptable.

-- CrawfordCurrie - 03 Aug 2009

The thoughts of the foswiki gurus are very interesting, the technical and philosophical discussions are excellent.

But seriously, from the point of an (inexperienced) wiki admin: in a wiki, nothing should be deleted - with the following exceptions:
  • the attachment of a bogus file, with the real need for a good solution for duplicate file names (for instance, immediate deletion in the file system)
  • the top revision of a topic if the content contains an affront or embarrassing text (but in this case, people quickly create a new revision and come 3 days after to the admins "Hey, please delete this revision...")
  • a whole topic including the history, if the content is taken over in another topic and the history of this process is not needed.
In all this cases: if we want to delete something, we really want to get rid of the garbage. No restore needed. No trashcans. No restore of a certain revision - before you ask us admins to delete, you should think about it. And then we can refuse obedience, because we are the admins. (Cause of Tasks.Item1879, at our installation the normal users are not even allowed to rename a topic. Stop: the second factor is WebNotify).

Reading this proposal from the beginning, I agree with the title and the need for a better way to delete attachments or topics with duplicate names. But I do not agree to the Motivation: if one decides to blast something and one is allowed to do so, then this something should be annihilated. If it should be archived, then the discussion above can take place. But the principal point is the decision: should this thing be devastated cause it stinks or is there the need to keep it in a dry and cold place because one will need it in the future?

People are familiar today with the methods software developers are aware of the indetermination of their customers - but the only times I deleted the wrong files (that happened more than one time) I was annoyed at the security queries, I did not read what they say. Exactly one reminder ("You are about to delete 1.543.211 files with extension .mp3") is adequate. Or no one - at my Atari ST I didn't need this at all. Deleted is deleted.

So I think the functionality of a restorable deletion of webs, topics and attachments is a not needed offer to the users to crucify the admins, and where the admins are untroubled, it hits the developers.

-- WolfgangRaus - 03 Aug 2009

Wolfgang, you are welcome to try and get that idea past an ISO9001 auditor! They will have your bollocks for breakfast wink Most quality systems require that histories of everything are maintained, including deletions. Nothing escapes. It is this requirement that we have always been interested in addressing.

-- CrawfordCurrie - 03 Aug 2009

Some years ago, I was internal auditor in the IT department of the municipality, and believe me: an ISO 9000/9001 auditor is a gentle sausage dog, we were the dobermans wink

Whats the point? There are two scenaries described:
  • the open access model where everyone can delete ("move to trashcan") any topic and can restore any revision of any topic, just as one thinks that is needed.
  • the quality system, where the deletion of data have to be a well documented process (and is regularly done with the need of two persons - 4-eyes-principle). Deletion means the removement of data out from the system. Backups are made, and the data will be deleted. The records will be preserved in a strongbox, and the backup media will be preserved in another strongbox.
I never had the need of restoring deleted data in a high quality software system. The deletion ever had a good reason. What you are discussing, is the emulation of a trashcan-functionality of some (today I think all) operating systems or their desktop environments. If you really think of adressing the requirements of a quality system (turning the open can with worms) , you must support the 4-eyes-principle, the automated generation of deletion records, a special application user role (let's name it "auditor"), (distributing the worms all about the whole ship) delimit the role of the Admingroup, and avert that someone can touch the filesystem, just to list a few points.

And off the beaten track, we admins of small (user < 500) foswikis with a few protected webs/subwebs (in my case ~50) look and hope that the coded result will be something that our installations can bear. And that is the reason why I try to argue my point of view here: the typical intranet installation of foswiki in a not-IT-department cannot deal with the right to delete something for all users. No technical reasons (WebNotify, security issues), the people have the fear that someting they write could be deleted. 90% of our users are proud of their capability to control the computer with the mouse and are actually clicking the same icons and menus all the time, 5% can produce new excel tables and are therefore the real gurus (they think so), and the rest have heard of web 2.0, but don't know the difference between a discussion forum and a weblog. [I hope nobody of them read this... smile ] - I do not want them to have the right to delete or to rename, not even their own stuff. And on the other side, our installations are far away from a certified high quality system.

If the discussion about the technical implementation of elisions (an ingenious approach) or the other propositions will lead to a solution which we can use in our installations, then not only I will be happy. Please don't forget the tiny intranet installations.

-- WolfgangRaus - 03 Aug 2009

Okay ... philosophical level. Here's my try wink

Content in blogs and forums do age well. Content somewhat disappears in the past by design. People do asking the same question again in a forum. Sometimes that's because the retrieval means for old content are so so bad in forum software and it is much quicker to ask again and let people react. Totally valid.

Blogs have all of their content stringed on a time flow. So old news is simply harder to dig out of that stream. Who cares about my blabla from yesterday anyway. Same with twitter. Very fast aging.

Well, and Wikis are terribly bad in forgetting content. Most of the time bogus content keeps hanging around forever, given you do not throw away project webs when the projects are done... which is actually a good practice sometimes. Webs are convenient way to throw away big chunks of outdated content in one go. And you even plan ahead of the content to be thrown away. Some is refactored in a knowledgebase like a finishing report - the rest simply goes over the fence.

See, Wolfgang, there's a completely different way to think about deletions. Forgetting is quite healthy sometimes. Forgetting the right thing is even better, however that is supported on a technical or on an organizational level. Keeping knowledge up to date also means deleting some of it from time to time while replacing it with newer documents.

And that's why we have to provide it as a technical feature including restoring deleted content in case you made an error. If you use this feature or not is your decision.

-- MichaelDaum - 03 Aug 2009

Good discussion, this.

In my daily work i see wikis used in many different ways. One common use I see Foswiki put to - irrespective of organisation size - is as a documentation repository, which is often linked to a QMS. Yes, of course you can use external systems, such as backup, to fulfill the history requirements of such a system, but it is a selling point for people considering Foswiki that it has this "nothing is ever lost" capability. It ticks the right boxes for management.

At the same time wikis, to me, are all about refactoring - iterative refinement of ideas, continuous improvement, collaborative authoring. It has always been my goal to support refactoring more explicitly, with tools that actively help you - for example - collapse time-bound threads into theses, find synergies between documents, reorganise taxonomies etc etc. Histories are inextricably bound up with this.

-- CrawfordCurrie - 04 Aug 2009

Back to the more practical part and to the original proposal.

Idea from Sven Vetter (trivadis.com) ... Instead of generating new names in the trash web to avoid collisions it would be better to rename only the old object thus moving it out of the way and then store the newly trashed object under its original name. Makes it easier to find.

-- MichaelDaum - 04 Aug 2009

Moved in content from MakeDeletionSmarter, which was a duplicate proposal (now deleted).

What this topic really needs is someone to pick up the banner of this work. I have too many other things to do.

-- CrawfordCurrie - 17 Aug 2009

On a large wiki we had to abandon use of rename altogether. With hundreds of thousands of topics in Main, the actual implementation of rename performs linearly in the number of other pages in the same web and times out in either Apache or the browser. Worse, it happens multiple times: once to select possible topics to repair, once to rename all other links in the original web, and again to try every possible match line by line to fully qualify relative links in the moving page.

We first implenmented a simple delete without fixing other links, just to keep the service from melting.

Then we rolled out RedirectPlugin (re-implemented actually, ours was too old), which allowed the broken links previously deleted to be repaired and redirected to new, even off-wiki locations. This has been working pretty well, and an option to leave behind redirects would allow renames to happen as a maintenance job, not in the request itself. It also works around permission issues that prevent some inbound link repairs, and inbound links from outside the wiki (like bookmarks).

By leaving the page in place and just replacing the content with a redirect, performance gradually slows as chains of redirects develop, so some long-term strategy to mitigate that (midnight sweep and link fix perhaps) is still needed, but we haven't observed an immediate problem with just leaving the content in the history of the "deleted" or "renamed" topics. The rest of the wiki links just have to be updated before someone else comes along and adds new, unrelated content to the rename-stub topic. We haven't observed people trying to reuse the redirect-stub topics yet - they just choose new topic names.

-- DrakeDiedrich - 17 Mar 2010
Topic revision: r9 - 17 Mar 2010, DrakeDiedrich
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy