Feature Proposal: Support an XML schema for webs/topics

Motivation

We need to be able to talk Foswiki resources to XML-supporting web services. This is also necessary for using any XML database resources, such as dbxml.

Description and Documentation

Add an xml generator to Foswiki::Meta. i have already implemented a simple (almost 1:1) xml generator. Here's an (untested) DTD
<!ELEMENT web (web+, topic+)>
<!ATTLIST web
   name CDATA #REQUIRED>

<!ELEMENT topic (body, form+, topicmoved, topicparent)>
<!ATTLIST topic
   name CDATA #REQUIRED
   format CDATA
   date CDATA
   version CDATA
   rev CDATA
   author CDATA>

<!ELEMENT body (#CDATA)>

<!ELEMENT form (field+)>
<!ATTLIST form
   name CDATA #REQUIRED>

<!ELEMENT topicmoved EMPTY>
<!ATTLIST topicmoved
   to CDATA #REQUIRED
   from CDATA #REQUIRED
   date CDATA #REQUIRED
   by CDATA #REQUIRED>

<!ELEMENT topicparent EMPTY>
<!ATTLIST topicparent
   name CDATA #REQUIRED>

<!ELEMENT fileattachment EMPTY>
<!ATTLIST fileattachment
   name CDATA #REQUIRED
   version CDATA
   path CDATA
   size CDATA
   date CDATA
   user CDATA
   comment CDATA
   attr CDATA
   movedfrom CDATA
   movedby CDATA
   movedto CDATA
   moveddate CDATA>

<!ELEMENT field EMPTY>
<!ATTLIST field
   name CDATA #REQUIRED
   value CDATA #REQUIRED
   title CDATA>

Examples

<web name="TemporaryMetaTestsTestWebMetaTests">
 <topic name="TestTopicMetaTests" format="1.1" version="1.1" date="12345678" rev="1" author="BaseUserMapping_666">
  <body>
   <![CDATA[BLEEGLE
]]>
  </body>
 </topic>
 <topic name="WebPreferences" format="1.1" version="1.1" date="12345678" rev="1" author="BaseUserMapping_666">
  <body>
   <![CDATA[Preferences]]>
  </body>
 </topic>
 <web name="SubWeb">
  <topic name="WebPreferences" format="1.1" version="1.1" date="12345678" rev="1" author="BaseUserMapping_666">
   <body><![CDATA[Preferences]]>
   </body>
  </topic>
 </web>
</web>

Impact

%WHATDOESITAFFECT%
edit

Implementation

-- Contributors: CrawfordCurrie - 30 Nov 2009

Discussion

Some comments:
  • I guess fileattachment should be a possible child elem of topic as well.
  • How do we deal with non-standard meta data like META:COMMENT and the like?
  • I really would like to see more sub-topic xml structuring, i.e. for tables, paragraphs and sections.
  • META:PREFERENCES are missing.
  • We should use a foswiki namespace.
  • It might be of advantage to have a web attr of the topic node in addition to nesting it ... or one or the other.
  • Ideally, this would not be called "support for xml generation" only. There are more objects in foswiki that would be worth of xml-ifying and which are not subsumed by the current Meta API. So adding a toXml to the Meta class is just a first step.
  • Internal modules and extensions would ideally operate on the dom being passed around. See Xwiki's rendering module and the role of the the document object in it.
Here's a quick brainstorming on an xml schema:

<foswiki:topic name="..." rev="..." author="..." date="..." web="...">

   <foswiki:section name="..." type="...">
      ...foswiki:paragraph...foswiki:table...foswiki:section...
   </foswiki:section>

   <foswiki:paragraph>
      ...foswiki:paragraph...foswiki:table...foswiki:section...
   </foswiki:paragraph>
      
   <foswiki:table type="html|wiki">
     <foswiki:tr>
       <foswiki:th>....</foswiki:th>
       <foswiki:td>....</foswiki:td>
     </foswiki:tr>
   </foswiki:table>

   <foswiki:preference scope="local|set"> ...value... </foswiki:preference>
   <foswiki:preference scope="local|set"> ...value... </foswiki:preference>

   <foswiki:acl type="allow|deny" action="view|change|rename">
     <foswiki:user id="..." />
     <foswiki:user id="..." />
   </foswiki:acl>
    
   <foswiki:attachment name="..." date="..." size="..." comment="..." url="..." />
   <foswiki:attachment name="..." date="..." size="..." comment="..." url="..." />

   <foswiki:meta type="custom" name="..." ... />
   <foswiki:meta type="custom" name="..." ... />
    
   <foswiki:form name="...">
     <foswiki:field name="..." title="...">...value...</foswiki:field>
     <foswiki:field name="..." title="...">...value...</foswiki:field>
   </foswiki:form>

   <foswiki:form name="...">
     <foswiki:field name="..." title="...">...value...</foswiki:field>
     <foswiki:field name="..." title="...">...value...</foswiki:field>
   </foswiki:form>

</foswiki:topic>

Later on, we might also think about storing more information about a site into xml, i.e. backlinks, wanted pages, user info and acls:

<foswiki:group id="...">
   <foswiki:acl type="allow|deny" action="view|change|rename">
      ...
   </foswiki:acl>

   <foswiki:user id="..." />
   <foswiki:user id="..." />
</foswiki:group>

<foswiki:user id="..." login="..." displayname="..." password="..." />
<foswiki:user id="..." login="..." displayname="..." password="..." />
<foswiki:user id="..." login="..." displayname="..." password="..." />

If we have a web node then we might also think about this

<foswiki:web name="..." summary="...">
   <foswiki:preference name="..." finalized="yes|no"> 
      ...
   </foswiki:preference>

   <foswiki:acl type="allow|deny" action="view|change|rename">
     <foswiki:user id="..." />
     <foswiki:user id="..." />
   </foswiki:acl>

   <foswiki:web name="..." summary="...">
   ...
   </foswiki:web>
</foswiki:web>

-- MichaelDaum - 30 Nov 2009

I am very excited to see others thinking about this.

In fact when I first picked up our tmwiki/foswiki installation, TWiki:Plugins.XmlQueryPlugin confirmed in my mind that we had a viable platform for XML interoperability (a requirement I will have to work on in 2010).

It is important however that we think about re-using existing schemas (or at least justify why we wouldn't) for as much of our metadata as possible. DCMI at a minimum (mentioned in the DITA standard, see SupportDITA).

I will see if I can convince my colleagues (who are closer to the soup of XML standards than I) to help this effort.

  • Later: After an initial discussion, it seems we could also use FOAF for describing users along with DCMI to describe a good chunk of our topic metadata. Using existing standards where possible will make the generated XML easier for integrators who will be consuming it.

-- PaulHarvey - 02 Dec 2009

The starting point was of course an almost-verbatim map of the existing schema. While using standard schemata has attractions, it is also potentially a lot more work as code has to be develoed to decide what to do about extra/missing bits that don't map to the Foswiki schema. How good a fit could we get?

On Michael's points:
  • I guess fileattachment should be a possible child elem of topic as well
    • I thought about this, but decided against for no good reason (probably just laziness again)
  • How do we deal with non-standard meta data like META:COMMENT and the like?
    • Good question. A DTD would require extension, but by maintaining 1:1 name mappings from the XML to the Foswiki element (e.g. <comment> would map to META:COMMENT) it lets us write a SAX-based parser independent of the DTD. The only problem would come if one of these non-standard meta-data were structured, in the way FORM/FIELD are.
  • I really would like to see more sub-topic xml structuring, i.e. for tables, paragraphs and sections.
    • The reason I didn't try to extend the schema into the text data (paras, tables etc) is the old one; there are many different potential views (flat macros, expanded macros, HTML, heading based, table based, list based, structured TML, chained etc etc) and I don't think this is the right way to drive that decomposition. This topic is just about generating XML.
  • META:PREFERENCES are missing.
    • Fair enough
  • We should use a foswiki namespace.
    • Yes to the namespace, I was just lazy.
  • It might be of advantage to have a web attr of the topic node in addition to nesting it ... or one or the other.
    • Hmmm. I'd be against that, because it makes it awkward to move a topic by simply juggling pointers in the DOM. When the XML is parsed, then the DOM can have a parent attribute derived at that time, but outputting it seems restrictive to me.
  • Ideally, this would not be called "support for xml generation" only. There are more objects in foswiki that would be worth of xml-ifying and which are not subsumed by the current Meta API. So adding a toXml to the Meta class is just a first step.
    • Sure, but topics was my initial focus (and initial requirement)
  • Internal modules and extensions would ideally operate on the dom being passed around. See Xwiki's rendering module and the role of the the document object in it.
    • That is indeed the ultimate goal of a lot of my internal restructuring of the Foswiki::Meta object.

-- CrawfordCurrie - 04 Dec 2009

  • The generated XML doesn't have to use fully qualified foswikI:namespace on each element, it can be set as the default namespace at the top of the document, and then it will be implied.
  • Is the reason we're bothering with the format version because we'll not be expanding macros?

But I have to ask - who is this XML for? If you're just wanting an XML dump for disaster recovery/legacy reasons, then what has been proposed so far is fine. But even if we went this way, we would have to waste a lot of time documenting the concepts behind the elements and attributes in the DTD - or are we just going to point integrators to Foswiki::Meta.pm? I suppose that doesn't matter, if the XML is only going to be handled amongst the Foswiki user/developer community...

But if we're hoping to keep Foswiki relevant as an information/knowledge management system for another 10 years, it has to play nice with the semantic web, linked data world.

Things like wolfram alpha are emerging which leverage this stuff. In my case there are a slew of biodiversity data aggregation services (mostly in development) out there which are not going to bother trying to understand an obscure XML schema, with or without comprehensive documentation.

I am confident we could mix dc:, foaf:, skos: etc. namespace elements into our XML at very little (perl code dev) cost and then fall back to foswiki: namespace elements for concepts that don't map well in to other standards.

The beauty of this? Aggregation/indexing/discovery services probably wouldn't care a whole lot about the things we have to fall back to a foswikI:namespace anyway. They can ignore the elements they don't understand, and take note of the parts they do: author, title, dates, revisions, etc.

The benefits would be huge IMHO.

I am hoping that these mappings wouldn't impose any additional burden on development over what you've already proposed, if my colleague can come up with the mappings for you.

-- PaulHarvey - 05 Dec 2009

Well, my reasons for starting this are quite pedestrian. I'm working on transforming Foswiki SEARCH queries to XPath, and I need a vehicle for testing. Since all the XPath engines I can find require an underlying XML DOM, it's easiest to generate XML for them to work on. I'm targeting a schema that is close to the %META schema to keep it simple. I was not anticipating using this schema with a web service; I'm not against it, I just hadn't thought about it.

With regard to standards, correct me if I'm naive, but surely there are XSLT engines out there that can be used to transform to/from any standard you choose?

-- CrawfordCurrie - 05 Dec 2009

Okay, I see more clearly now. Is the XML you're generating entirely for back-end purposes? I was thinking from the point of view of what a topic/web would look like through a viewxml script, or ?content-type=application/rdf+xml. If it could be trivial (plugin perhaps) to put an XSLT engine in between your "native" XML and the public-facing XML, that would make me happy...

-- PaulHarvey - 05 Dec 2009

Nothing is ever trivial. but I don't think it would be hard to interpose an XSLT stage. The only problem would be deriving a DTD that covered meta registered by extensions; but since we are (currently) restricting these to unstructured elements, that isn't a problem.

More tricky is the idea of using this to drive a TOM definition (breaking down and extracting structure from topic content). That would inevitably require structured elements in the DTD. But I'm not too scared by that, because nothing proposed so far has drifted too far from the basic DTD I proposed above.

-- CrawfordCurrie - 05 Dec 2009
Topic revision: r9 - 05 Dec 2009, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy