Skip to main content

Drupal and it's input formats

Posted in

Let's face it, Drupal's input formats are currently a mess. Directly after installation, the CMS offers "Filtered HTML" and "Full HTML", with "Filtered HTML" being the default. In theory, this works fine. Trusted users can use the entire set of tags, while untrusted users are limited to a stripped down version.
In practice, there are a number of serious shortcomings: trusted users have to manually switch to "Full HTML", whenever they need a tag not included in the filtered set. Upon doing so, however, they loose the line break converter. For tech savvy folks, this is just annoying. For anyone else, this poses a real problem (just try to teach your grandma to use HTML!).

The naive approach to solving this problem is, of course, to engage in dirty hacks. Things like allowing more tags in "Filtered HTML", putting the line break converter into "Full HTML", and adding various ad hoc custom formats.
In the end, the website in question sports half a dozen of (potentially redundant) input formats and tons of nodes requiring some specific filter setup in order to display properly.

Ok, the big question is: how do you prevent your website turning into a big mess in the first place? For starters, the golden rules is: DO NOT TOUCH THE TWO DEFAULT INPUT FORMATS! Just leave them the way they are and preferably do not even use them. Keep them around as a reference. With that being said, let's come up with a robust set of general purpose input formats. In order to do so, we have to first think about users first. In general, they can either be trusted or not trusted in addition to being tech savvy or not tech savvy, which leaves us with for combinations:

  • Tech savvy and trusted (e.g. webmasters)
  • Not tech savvy and trusted (e.g. editorial staff in general)
  • Tech savvy and not trusted (e.g. guests)
  • Not tech savvy and not trusted (e.g. guests)

These four combinations call for up to four, but not necessarily exactly four input formats, as the last two groups can actually share the same input format (in practice it would be hard to distinguish one guest from the other anyway).

Let's start with the tech savvy and trusted users. These usually don't mind using HTML and might in fact even demand to be able to use it. The only reason for them to not choose "Full HTML" is the lack of the line break converter and being forced to manually encode things like angle brackets. This group of users can be satisfied rather easily. Download the Markdown filter and create a new input format "Markdown HTML". On the new filters configuration page check "HTML corrector" and "Markdown". Optionally, also throw "Line break" in, if you do not like the way, "Markup" handles line breaks. Rearrange the filters in this order: "HTML corrector", "Markdown", "Line break converter".

Next in line are the trusted, but not tech savvy folks. These typically want all of what HTML has to offer without actually having to learn HTML. In other words, they want a WYSIWYG editor. This is even easier. Create a new input format "WYSIWYG HTML" and make it a clone of "Full HTML" (you'll see later why to clone). Leave it like this for the moment.

Last but not least is the group of untrusted users, which get the same input format regardless of whether they are tech savvy or not. Their input format, called "Save Markup" is the most difficult of all, as it allows for making several choices. Two good choices are giving the format an editor (preferably lightweight one) and, of course, not allowing all HTML tags. To make a point of not allowing HTML, download and install the BBcode filter. Create the new input format with that filter enabled. The problem with BBCode is, that it allows the image tag. This is not desirable, as it allows malicious users to place webbugs. To strip undesired HTML tags, also enable "HTML filter" with the same set of tags allowed as in "Filtered HTML" (you did keep that input format around as a reference, didn't you?). Rearrange the filters, so that "BBCode" comes before "HTML Filter".

After setting up the input formats, make sure that "Safe Markup" can be used by all users and is enabled as the default input format. The original "Filtered HTML" and "Full HTML" should be available to no one (only kept as a reference and maybe for use by the site administrator), the remaining two input formats should be assigned as needed.
This concludes the ground works. Next is setting up editors.

Editors are (or rather: should be) handled by the WYSIWYG API. You might also want to download imce and the imce bridge in order to handle image uploads. The editors of choice are TinyMCE and BUEditor (installation instructions are given in the wysiwyg api configuration page). The reason for installing two editors instead of one is that, contrary to it's name, TinyMCE is anything but tiny and most certainly not something to put on comment forms, accessed by anonymous users.
After creating two profiles, one for each editor, assign them to the according input formats (TinyMCE for "WYSIWYG HTML" and BUEditor for "Safe Markup"). The "Markdown HTML" format does require an editor.

At this point, we have input formats, based upon the capabilities of the users. Editors are automatically toggled, when the user switches to an input format, requiring them. The only thing left to do is to automatically select the default input format for the user. This is a bit tricky and for the most part depends on the site's setup. The basic idea however is, to use Better Formats to select the default filter per role/node type and User Default Filter to allow users to set their own preferences.