Feature Proposal: Support $comma formatting token

Motivation

While coding for ENCODEnlsToBR, I realised I need a $comma formatting token to protect comma where data that may include commas is to be interpreted as a comma-separated list - something we make increasing use of.

It seems sensible and consistent if that $comma formatting token is included in the standard set of formatting tokens, alongside $dollar etc. I think it's unlikely to trip over any text (except perhaps $command, which will probably be written as $dollarcommand anyway.

The $comma token is already supported by SpreadSheetPlugin

See also SupportDollarPercent

Usage examples. I have a bunch of topics where there are form fields that contain free text. The free text may contain any unicode character. A typical form field is:

Andrea, Kathy, and Leigh visited NitCon; see NitConReport for details

I then have to report unique occurrences of this field from a search which goes over N topics in M webs.
%CALC{"$LISTUNIQUE(%SEARCH{... separator="," format="$formfield(RecentEvents)"}%)"}%
which of course doesn't work because of the commas in the form field data. In theory I could use FilterPlugin to parse the list on a different SEARCH separator (e.g. wink but then I have the problem that the new separator may occur in the data field.

Another approach is to encode the form field before attempting to parse the field values (using either SpreadShitPlugin or FilterPlugin)
%ENCODE{"%CALC{"$LISTUNIQUE(%SEARCH{... separator=","
 format="$percentENCODE{\"$formfield(Progress)"\" old=\"$dollardollar,$dollarcomma\" new=\"DOLLAR,COMMA\"}$percent)"}%"
 old="COMMA,DOLLAR" new="$comma,$dollar"}%
(Note that the example won't work as written, because %CALC according to weird rules. In the real use case the %CALC{$LISTUNIQUE is done by a bespoke plugin)

This is just one example of a common problem encountered when manipulating lists of values that may contain arbitrary strings.

Description and Documentation

Add this to FormatTokens
$comma Comma (,)
The code to handle it is a simple copy-paste of one of the other tokens.

I already have the changes as part of the ENCODEnlsToBR proposal; if that is rejected, it would be easy to check in only the $comma part.

Examples

%ENCODE{"blah,blah" old="$comma,b" new=";,z"}% > =zlah;zlah

%WHATDOESITAFFECT%
edit

Implementation

Nothing special about implementation

Community Vote

This proposal has been discussed during the 14-day period. Concern has been raised. And there is no sign of a consensus being reached.

Spec is clear. We can all see what the proposal is all about. So it is time for community vote. The vote runs for 7 days starting 24 Mar 2010 and ending 31 Mar 2010 at midnight GMT. Because the code to add is less than 50 bytes the code can be checked in around the deadline for feature freeze.

The question is: Do you agree that the token $comma meaning "," should be added to the existing FormatTokens?

Name Vote (Yes/No)
Yes
No
Yes
Yes
Yes
Yes
Yes

-- Contributors: CrawfordCurrie - 11 Mar 2010

Discussion

It is easy enough to avoid the clash with $command by using $comma() instead of $comma.

-- MichaelTempest - 11 Mar 2010

Am I the only one that thinks the $foo $foo() work-around just adds more confuddlement for ordinary users?

All these quirks add up to a lot of trial and error that at least my users just don't have time for..

But then, I suppose we will always have these artefacts when working with string soup.

-- PaulHarvey - 11 Mar 2010

I'm afraid so. If we had our time again, we'd develop a rational and consistent scheme for escapes that didn't have all these horrible exceptions and one ofs and confustications. But we don't.

-- CrawfordCurrie - 12 Mar 2010

More cruft rolling down the alley. frown, sad smile

Commas don't cut it. Lists are more complex. List items are more complex.

Please have a look at %FORMATLIST how to properly parse a list and how to work around list items being split up accidentally. I've outlined its list parser somewhere else already.

-- MichaelDaum - 12 Mar 2010

I agree that it's cruft, and I don't like $comma any more than I like any of the other $bollocks. However I am not proposing completely reworking everyone's code for parsing lists. I'm observing that comma-separated lists are already endemic, and it would help to have $comma to support them.

-- CrawfordCurrie - 12 Mar 2010

Nice try for an excuse.

-- MichaelDaum - 12 Mar 2010

Michael, I'm not going to recode everyone's list parsers for them. So we can either:
  1. Only support $comma as a special case in ENCODE
  2. Support it consistently everywhere, as proposed here.
I take it you are saying it has to be (1)?

-- CrawfordCurrie - 14 Mar 2010

A question not to be meant for or against $comma. What prevents using \, ?

-- KennethLavrsen - 14 Mar 2010

The main argument against \, is that the list "parsers" throughout the code are very stupid; they split(/\s*,\s*/, $var) so would pick up \,. As I already remarked to Michael, I do not propose to sanitise everyone's list parsers, even if I could find them.

This proposal is dead as long as Michael maintains his concern without proposing any reasonable alternative.

-- CrawfordCurrie - 24 Mar 2010

This proposal is simple. It is fully spec'ed. It is easy to decide on a yes or no. And it seems there are no more ideas or arguments.

So we take it for a community vote. I add a vote table at the top of the proposal.

Please stop making counter proposals during vote. It is a matter of yes and no.

-- KennethLavrsen - 24 Mar 2010

Just to make my point clear: standard escapes, i.e. $percnt, $dollar, $nop, $n and \ (backslash), only have a right to exist because they are used inside an argument position of a macro with the sole purpose to delay macro expansion. There is no other reason for them. It follows that all other $thing escapes are unnecessary as they play absolutely no role in the TML parser.

Please, let's keep the set of standard escapes to the absolute minimum.

Btw. this proposal lacks any example why and how a $comma would be inevitable.

-- MichaelDaum - 24 Mar 2010

I had doubts also, but finally decided to support the proposal after reading the discussion on IRC tonight. The tokens already contain symbols that are not only for delay of macros. We added the $gt with the same kind of discussion. Yet, many of us have found these new tokens quite useful and nice to work with. So I do not mind expanding them a little further also in future.

I also have the opinion that if one of the most productive contributors wants something in an open source project, I will not vote against it if it does not harm anyone. I do not need to see myself using a feature to support it. I mainly look for usability, compatibility, and clarity. And I look for if the proposals can be understood by beginners. This proposal is backwards compatible, does not harm anyone, and it seems to be important for Crawford and his users. And any beginner can understand the table of tokens. That is enough for me to support it, even if I would not have raised this proposal myself.

Crawford, thanks for the clarification of why \, does not work. Yes. I have seen several plugins where split is used in a way that does not support \,. In fact I have just written one myself 3 weeks ago. $comma will work there.

-- KennethLavrsen - 24 Mar 2010

Yes, this proposal is simple to grasp. That doesn't justify it at all.

There've been other feature proposals been rejected even though they were backwards compatible (RulebasedViewTemplates).

Again: this proposal lacks any example showing that $comma is an absolute must.

I was against $gt and $lt for the very same reasons, I am against more $cruft of that sort, because it defeats the initial purpose of the standard escapes. Now I see those opened the floodgates. Please, this is really not the way to go.

-- MichaelDaum - 25 Mar 2010

I added an example of usage.

-- CrawfordCurrie - 25 Mar 2010

According to the example above, this is actually a deficiency of LISTUNIQUE as it has got a hard-coded comma as a list separator. That's what you are trying to work around.

Well, here's how to handle this using FilterPlugin
%FORMATLIST{

   "%SEARCH{
       ... 
       separator="\n" 
       format="$formfield(RecentEvents)"
   }%"

   split="\n"
   unique="on"
}%

-- MichaelDaum - 25 Mar 2010

That works iff there is no \n in the search result, as I noted under the example. Yes, I'm working around a deficiency in CALC, but get real - deficiencies exist and the more the core can do to help overcome them, the easier it is for end users.

-- CrawfordCurrie - 25 Mar 2010

It won't have a \n in it as long as your RecentEvents isn't a textarea formfield. Otherwise use any other char that won't occur (how about a \0) for separating and splitting. There's a tokenize parameter to %FORMATLIST as well to address exactly this issue. It will collapse any matching element within the list before splitting the list up, and then expand tokens afterwards.

-- MichaelDaum - 26 Mar 2010

That's fine, but (1) RecentEvents is a textarea and (2) we're not talking about FORMATLIST, we're talking about other macros that currently barf on unencoded characters, such as CALC and (3) with SEARCH you cannot use a \0 as a separator, it's an attr value and attr values don't support \ escapes.

I don't consider installing another plugin - especially one like FilterPlugin that comes with a bunch of extra baggage - to be a reasonable alternative to $comma.

-- CrawfordCurrie - 26 Mar 2010

I'm not entirely sure why $comma could be a disaster. Even the $command isn't that real - the format extraction work I've begun in Foswiki::Search and will continue in 2.0 shows that $format operators will need to be processed in a sympathetic order (ie, do the longer ones first, then trickle down to the shortest.. - with a segue into ()).

by the time foswiki 2.0 is released, we can expect a Foswiki::Func::formatResult which takes an iterator of results and a hash of custom function pointers for a macro to add its own extra $format operators - with the core common ones being on by default.

-- SvenDowideit - 27 Mar 2010

... which should remind us that both proposed ways to handle lists - either via back-and-forth encoding of $comma or using %FORMATLIST - are flawed as they both propagate a result set through the parser.

Both approaches flatten a result set and the try to reconstruct the original set using some kind of parsing. That's very very bad, even more done the crude way proposed here.

Instead the original result set as constructed by %SEARCH should be stored temporarily and made available using some kind of ID.

-- MichaelDaum - 29 Mar 2010

Of course. But I will remind you that there are other sources of lists - for example, lists of group members in the TopicUserMapping, lists returned by other macros supported by plugins that have not had the benefit of result set treatment. I understand your arguments, and agree with most of them; however I do not find them helpful or constructive with respect to this proposal.

For better or for worse, $comma is already supported by %CALC and %ENCODE. Adding it does not open the floodgates, it is a harmless and consistent extension to the set of standard escapes.

-- CrawfordCurrie - 29 Mar 2010

Well, I gave you a solution to your problem and exactly outlined how it works. You could have been using it for quite some time already. How could I be more constructive?

-- MichaelDaum - 29 Mar 2010

You gave me what you thought was a solution to the problem. However it does not address the problem, as I described above. This is getting ridiculous; we are wasting energy here.

-- CrawfordCurrie - 29 Mar 2010

I think we have enough support for this to go through on majority vote. There's no time limit on a vote, so this is just based on gut feel. Michael's concern will remain here for all time, for him to point us at, and crow about, when it all goes pear-shaped.

-- CrawfordCurrie - 01 Apr 2010

There is a time limit on the vote.

It is 7 days. This was also clearly announced in the mailing list email where the vote was called for.

The vote expired 31st at midnight GMT. So it is correct to call the vote accepted by 6 votes for and 1 vote against.

-- KennethLavrsen - 01 Apr 2010

Thanks Kenneth. This work has been merged to trunk.

-- CrawfordCurrie - 02 Apr 2010
Topic revision: r33 - 05 Jul 2015, GeorgeClark
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy