The bug
The bug manifests itself when you view attachments through the viewfile script, which is the case for example when you have a
RewriteRule? like the one below in place to protect attachments from public access:
RewriteRule ^wiki/pub/([^\/]+)/([^\/]+)/([^\/]+)$ cgi-bin/viewfile/$1/$2?filename=$3 [L,PT]
(Note: I also use the german UTF-8 site charset and specify upper and lower national characters manually in the configure script, but I believe this bug will also appear with other charsets, as long as a special character gets URL-encoded.)
To reproduce it:
- Create A topic that contains special characters in its name, e.g. german umlauts ("Kursübersicht").
- Attach a simple text file to this topic.
- Then try to view the attachment with the viewfile script (or by clicking on it if you have a RewriteRule like the one above...)
- The viewing fails with an "Attachment ? does not exist" exception.
Explanation and bugfix
Because the URL to the attachment includes the topic name, which contains a special character, the URL and thus
path_info (from which the topic name is taken in lib/Foswiki/UI/ViewFile.pm) will contain an URL-encoded special character. However, lib/Foswiki/UI/ViewFile.pm does not seem to url-decode the topic name before untainting and validating it. The call to
isValidTopicName in Foswiki.pm then returns 0 because the topic string contains %, which are filtered out by Foswikis default NameFilter. So in lib/Foswiki/UI/ViewFile.pm,
$topic remains undefined. If you change the following line
# The next element on the path has to be the topic name
$topic = Foswiki::Sandbox::untaint(
shift(@path),
\&Foswiki::Sandbox::validateTopicName);
to this:
# The next element on the path has to be the topic name
my $decodedTopicName = Foswiki::urlDecode(shift(@path));
#$decodedTopicName = $session->UTF82SiteCharSet($decodedTopicName);
$topic = Foswiki::Sandbox::untaint(
$decodedTopicName,
\&Foswiki::Sandbox::validateTopicName);
then things work as they should with the special character topic names. If you uncomment the UTF8SiteCharSet-Line, then normal topic names break, but it might be needed in some cases (it's done for the filename and I'm not all too sure what's going on in it...)
BTW, while the filename is being treated for url-encoded characters, the web name also doesn't get treated, so the same bug would probably appear if you had a web with special characters in its name.
Would someone please confirm this and create a bug fix that's more than just my quick hack
--
PascalSchuppli - 25 Oct 2010
I would hesitate to just implement this fix.
I have tested with what we believe will be 1.1.1 and I have the following settings that should work well with all latin languages setup with an iso character set
- $Foswiki::cfg{UseLocale} = 1;
- $Foswiki::cfg{Site}{Locale} = 'en_US.ISO-8859-1';
- $Foswiki::cfg{Site}{LocaleRegexes} = 1;
- $Foswiki::cfg{UpperNational} = '';
- $Foswiki::cfg{LowerNational} = '';
- $Foswiki::cfg{Site}{CharSet} = 'iso-8859-1';
You may want to use en_US.ISO-8859-15 instead and the language could be another instead of en_US but that should mean little to how Foswiki works in practical life.
And then I can create topics and attach attachments with Danish letters like æøå or german letters like ü, ö, ä.
We know that UTF8 is not well supported yet and we are looking for someone who are experts in this area. Mainly for cyrillic and Chinese/Japanese. We have not seen any interest from arab or hebrew and it is impossible to develop these things when you have no clue how the language works character/right to left etc.
Pascal if you are setting your site up for German and you have not yet had it in production at all, then my advice is - use iso character set iso-8859-15 and forget UTF8 for now
--
KennethLavrsen - 25 Oct 2010
Hmmm, well, how about you test if my hacky bugfix breaks the code with a normal (iso-8859-1) charset and if it doesn't, implement it? Because this
IS a bug in the code and it needs fixing; you can either fix it now or have someone rediscover it five years from now when UTF-8 gets moved up from experimental status to production. It took me three hours to figure out why attachments on topics with umlauts wouldn't work, so that's three hours of work someone else won't have to do in the future if you fix it now that you know what causes the problem.
I know UTF-8 is still experimental, but I've been using it for quite a while in production wikis (telling people not to use Umlauts in topic names as a workaround and finally getting fed up with this and looking for the reason this doesn't work today) and it's been working better and better over the last two years (to the point where even the WYSIWYG-Editor now works; it didn't use to). I could report a few other bugs that have to do with UTF-8, but none of them are problematic enough to be show stoppers. This one is seriously annoying. Please fix it now.
BTW, I'm surprised that things work fine with iso-8859-1, I expected this bug to be
solely related to the failure to url-decode the topic name before passing it to the topic name validation, which should break iso-8859-1 as well. Strange that it doesn't.
--
PascalSchuppli - 25 Oct 2010
Deferring to 1.2 and the UTF-8 project.
--
GeorgeClark - 13 Dec 2011
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. see CopyrightStatement. 
