Glossary
Access Control
Acquisition
Aggregation
API (Application Programming Interface)
Application Server
Approval
Archive
Blog (Weblog)
CMS
Chat
Check-In/Out
Collaboration
Chunk
Community of Interest
Content
Customer Relationship Management
Database-backed
Development
Delivery
Discussion Group
Element
Extensible
Forum
Framework
Free Software
Globalization
Information Architect
Instant Messaging
Life Cycle
Localization
Mailing Lists
Metadata
Middleware
Multilingual
News Group (Usenet)
News Portal
Object-oriented database.
Ontology
Open Source
Personalization
Portal
Portlet
Presentation
Privileges
Resource description Framework (RDF)
Sandbox
Scheduling
Staging
Reuse
RDF (Resource Description Framework)
Repository
Roles
RSS
SCORM
Semantic Web
Structure
Sub-site
Syndicate
Tags, Tagging
Taxonomy
Template
URL Rewriting
Versioning
Web Publishing
Web Services
Wiki
Workflow
WYSIWYG
XHTML
XML (eXtensible Markup Language)
Access control is the system of privileges and permissions that guarantees security for the content. Only authorized persons can reach each element of content. Authentication establishes the identity of a site visitor. Authorization is usually given to a "role" like writer, copy editor, graphics editor, etc. Then individuals can be assigned to roles to establish their access to the content.
Access control granularity describes the fineness of access distinctions. Permissions can be as fine as individual files, or can be set at folder/directory levels. They can be as fine as every individual having different privileges.
Acquisition is the collection phase of assets for a CMS. They can be text files, images, audio or video files, animations including Flash, multimedia files. During acquisition all assets need to be tagged and then stored in the data repository (in the database or file system or both). Digital Rights Management (DRM) establishes the licensing requirements for all assets. Modern assets may be accompanied by metadata with authorship, terms of use, etc. A sophisticated CMS helps to manage this metadata.
Aggregation describes the receiving (or consuming) of RDF feeds (news feeds or data feeds) by a CMS. The producer of the feeds is called a syndicator.
API
Archiving generally refers to earlier versions of content or content nop longer in use. For a blog it is all the posts earlier than the current displayed selection. With today's vast hard drive memory capacities, many organizations are archiving everything as permanent institutional memory.
Blog is short for a Weblog. A blog is a single web page with dated chunks of information called posts. Posts are arranged in reverse chronological order so the most recent is always at the top. Each post has an anchor that allows a hyperlink to the post from anywhere on the web. This allows blog syndication.
The earliest blogs were annotated lists of hyperlinks visited by the blog owner.
The etymological root of "log" has nothing to do with logical or "ology." It refers to a a tree log (a piece of wood) that was thrown over the stern of a ship. A knotted line attached to the log allowed an estimate of the ship's speed over the water. Similar knots in a weighted line allowed depth estimates, as in "mark twain." Since notes were made in a book by the ship captain, it became known as a log book.
Today's blogs are often journals or personal diaries and may contain few hyperlinks. They may be filled with political commentary. In a company setting they be a work progress record.
Blog has also become a verb. Weblog has become "we blog." Blogging tools automate the maintenance of the blog HTML. The blogger enters plain text and the tool generates the post. The tool may also make the blog searchable by visitors.
A single-page blog may be the minimal CMS.
A CMS is a Content Management System. Today it generally means Web Content Management, sometimes called WCM.
The CMS acronym is also used widely in the academic community for a Course Management System, technology for e-learning in universities and corporate training centers.
CMS must be distinguished from Document Management, Knowledge Management, Assets Management, Rights Management, Customer Relations Management, Business Process Management, and Enterprise Information Management.
Checking out a piece of content puts a file lock on the content so someone else does not change it before the current user's changes are saved. A good system tells a new user who has the content, and provides means (email, IM/Chat, phone numbers) to contact the current user to release it. They also have administrative override privileges to seize content.
Some systems allow two or more users to change content, but they must then provide an easy way to reconcile differences. This can be done by color tagging different authors versions, or by a sophisticated "diff" comparator program. The best difference reconciliation is always done by a human editor.
Chunking is dividing information into sections that can carry a meaningful semantic tag (or possibly a style tag). It also tries to divide information into small enough sections that they can be comprehended and remembered easily. A content element might be a single chunk or contain smaller chunks.
Content is the fundamental information in page, as distinguished from its presentation or style or form.
Customer Relationship Management (Community Relationship Management)
Content may also include all the information about a site's users, company employees, vendor partners, business contacts, customers, prospects, etc. An integrated CMS will also manage this content. It could include a complete "clickthrough" record of every visitor to a website.
"Database-backed" describes web sites that keep important information in a database. They allow the web site to support information "transactions" with the user, with the results of all transactions stored permanently in the database. They allow a web site to have a memory.
Every organization and business has a database, even if it's just a collection of written records. The magic starts when those records are stored in a powerful "relational" database management system that allows "relations" to be made between the data in records that might not have seemed related at first. Still more powerful databases are called object-oriented. They provide very large containers for multi-megabyte media files and their accompanying metadata. The containers can inherit properties from prototypes, simplifying design and reuse.
And the really dazzling part comes when the database content is made accessible over the web (securely of course and with varying levels of permissions) to the many people inside and outside the organization who have an interest in the data.
Professor Philip Greenspun of MIT and Ars Digita (now the RedHat CMS) has made the case for the database in his book "Database-Backed Web Sites" and the more recent "Philip and Alex's Guide to Web Publishing."
A major objective of the world's largest corporations is to combine all their databases into a central "data warehouse," from which information can be delivered to the web using content management systems. With the cost of databases and web servers so low, even the world's smallest enterprises can now afford their own "enterprise information systems" and manage their enterprise over the web.
Delivery is another term for the final step in the CMS that publishes the pages. The delivery servers are sometimes called production servers, which is often confused with the production of content (usually called development). Delivery servers are often caching servers, and they may only provide "static" versions of pages that were generated from "dynamic" pages.
Personalization of pages requires that Delivery be from dynamic servers.
A content element usually occupies one of the places in a content template. Elements are usually embedded in the template with unique delimiting tags. All CM systems have developed their own unique tags. Generally they followed the tag style of their underlying application framework. Interoperability efforts are directed toward tags that use XML-style delimiters and names that can be found in an XML namespace and an ontology. The Dublin Core is the most widely used ontology.
Forum (Bulletin Board, Discussion Group, Newsgroup, Community of Interest)
Forums, Discussion Groups, Newsgroups, Mailing Lists and List-Servs, are all examples of online communities of interest. The content generated consists of all their members' messages to the Forum. Content management tools are scant, usually limited to moderating messages by editing or deleting them.
A community keeps track of its members. You are required to join to access any forums or mailing lists. Registration is usually free. They may be supported by advertising on the site. Communities may offer space on the community server for members to have their own personal web pages. They may gather information about members in a database, which then can be searched to locate specific members.
Forums and Discussion Groups may use a web interface (unlike list servs). They show "threaded" discussions (emails with responses to that email). Some may include email addresses for all participants, so you can contact other members of the forum. Forums usually send nothing to your email address (unless you elect such an option). You visit them to see what is happening. The largest provider of forum web sites is Yahoo Groups (formerly eGroups.com). Some sites publish the number of subscribers. Because of the "network effect," sites with the largest number of members are generally the most valuable.
Newsgroups are part of the original Usenet, with recognizable domain names, like sci.lang.translation. They are available via the nntp news protocol (the news server setting in your browser), or through web-based interfaces like deja.com, recently acquired by Google, which has the best multilingual support of any web search engine. Newsgroups may be completely open or moderated. Unmoderated groups are notorious for postings which may be completely off-topic (OT) and add a lot of noise to these important information channels.
Many web users do not distinguish between these Forums, Bulletin Boards, Discussion Groups, Newsgroups, and evven Mailing Lists. Simple "list-servs" broadcast an email to a (generally anonymous) list of subscribers. If the sender does not include a return email address, the message is anonymous. For this reason, we think of standard list-servs not as communities, but as "anonymities." Some mailing lists may optionally provide access to the member email addresses.
An application Framework is a set of tools for developing applications. A CMS Framework lets you develop Content Management Systems.
Since most CMS products consist of web server "middleware" that runs on a web application server (in front of a database server), a CMS Framework is usually identified with that middleware, like Microsoft's .NET and Active Server Pages (ASP), Sun's Java, J2EE, and JSP, PHP, or Macromedia Cold Fusion.
A Framework may support multiple programming or scripting languages, or just the native language in which the Framework is written. Zope is a Framework optimized for CMS development written in the Python language for an Apache server.
Information Architect (Taxonomy)
The Information Architect is responsible for chunking and tagging strategies, template designs, forms for content element entry, metadata collection, reuse, syndication and aggregation feeds, and database models or schemas. How all the information of an organization is reperesented in the content of the CMS is an architecture problem. This art of classifying information is sometimes called taxonomy, the division of things into genus and species. All species are a genus for subspecies.etc.
Content has a Life Cycle of Creation, Review, Approval, Delivery and Usage on the visible site, and then Archiving.
Mailing Lists (List Servs)
Mailing Lists are the major tool supporting communities of interest. Content consists of the individual emails to the lists. This content is archived in repositories that can be viewed by author, subject, date, and discussion thread. They are generally not indexed and searchable (except by Google, which searches every site). The most popular list serv (mailing list server) today among CMS developers is the Piper Mailman, programmed in Python.
Mailing lists are joined by sending an email to the list server from the email account that wants to receive the list. Sometimes a standard instruction must appear in the subject or body of the message - to subscribe, to unsubscribe, or perhaps to subscribe to a once-per-day "digest" of the email messages. A digest reduces the number of emails (active lists generate dozens per day) to a single email with brief summaries (usually the subject line) of each email message.
Most list-servs will send you a required confirmation of the request to join (so that strangers can not join in your name). This confirmation contains detailed instructions on unsubscribing, getting the digest if available, etc. It may also point you to a web site associated with the list. There you can expect to find access to an archive of the lists past messages. If the archive is searchable, you will be able to locate mesages on specific topics or that contain keywords. The list may also offer a FAQ (Frequently Asked Questions) with tips on how to use the mailing list and its archive.
Metadata is data that accompanies a piece of content. It describes the content, and might provide optional information like a caption, abstract, or keywords for search engines. It could include a creation date, publication date, and expiry date. Video data could include camera settings, times and geographical data of the shoot, cameraperson and editor names, etc. It could include copyright information and terms of use. Document metadata might include the full list of the Dublin Core ontology properties. It is usually stored in a relational database or an object-oriented database (the CMS Repository).
Middleware (Web Application Server)
Middleware is software that runs in the middle tier of a three-tier system of User Interface (Front End or Client), Application Server, and Database Server (Backend). Whereas web client software has a limited number of languages (generally HTML and Javascript, sometimes Java applets or Active-X Controls) and the Database is generally accessed via standard SQL, middleware languages and development frameworks used on the application server are diverse and many - notably Perl, PHP, ASP and .NET (Visual Basic, VBScript, C#), JSP, Java, C++, Tcl, Python, etc.
Be aware that your choice of CMS will be a commitment to a distinct software developer community. If you have developers in your organization already, make sure they buy in to your CMS selection.
Multilingual (Globalization, Localization, Translation)
Multilingual websites can serve pages in a specific language requested by the browser. They must have translated/localized pages on the server for each language supported. The server must recognize the browser's language request. Globalized websites attempt to server multiple cultures and languages with a single page. Localized websites are completely customized to fit a local culture (a locale), which might distinguish Brazilian Portuguese from that in Portugal. It involves much more than language translation.
A multilingual CMS needs a workflow system that notifies localizers, perhaps in different countries around the world, of the existence of new pages that must be localized.
A mulitlingual shortcut is to provide links to "gists," computer-translated versions of a page. See Open Internet Lexicon.
A News Portal is a specialized CMS that syndicates its headlines and stories into news feeds. The canonical news portal is Slashdot. It categorizes stories into topics and allows moderation by users that decides on the ranking order of stories.
The standard news portal has three dimensions or categories into which stories are collected - Topics, Sections, and Authors. Each user can set up a profile that limits visible postings to only some of these sections, topics, or authors.
Blogs are a subset of the typical news portal, with a single author and no Topics or Sections.
Ontology.
Open-source means that source code is openly available, and freely distributable in the sense that anyone may duplicate it and send it to others. They may charge for the distribution. They may use it for their own private purposes, modify it, and even sell the modifications. It does not, however, mean that it is freely usable.
The Open Source Initiative (www.opensource.org) defines "OSI Certified" software as follows:
Open Source promotes software reliability and quality by supporting independent peer review and rAcquisitiond evolution of source code. To be OSI certified, the software must be distributed under a license that guarantees the right to read, redistribute, modify, and use the software freely.
The case for Open Source is made most eloquently by Eric S. Raymond in his book "The Cathedral and the Bazaar", and most practically by the astonishing development of Linux. "With enough eyeballs, all bugs are shallow" describes development scrutinized by interested parties anywhere in the world. The result is software more robust than any closed proprietary software developed by the world's largest software corporations.
The Free Software Foundation (www.fsf.org - Creators of the GNU/Linux Operating System of which Linux is the more famous kernel) ironically recognize the right to charge for some uses as long as the software can not be "closed." They say that:
``Free software'' is a matter of liberty, not price. To understand
the concept, you should think of ``free speech'', not ``free beer.''
``Free software'' refers to the users' freedom to run, copy,
distribute, study, change and improve the software. More precisely,
it refers to four kinds of freedom, for the users of the software:
A program is free software if users have all of these freedoms. When FSF speaks of free software, they are referring to freedom, not price.
Thus, you should be free to redistribute copies, either with or
without modifications, either gratis or charging a fee for
distribution, to anyone anywhere
Excerpts from the GNU Public License (GPL) require that:
if you distribute copies of a (GPL) program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. (GPL programs) protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.
Software is open-source in the Free Software Foundation sense if it is freely distributable and modifiable by anyone.
Personalization may be genuine one-to-one marketing in which the CMS keeps track of the individual. Amazon currently has the most powerful personalization on the web. It uses every past purchase, and products browsed, to form a pattern on which to base purchase recommendations.
Less powerful personalization systmes aggrgate visitors by shared properties, tailoring the content delivery to broad categories.
Personalization of pages requires that Delivery be from dynamic servers.
Portal is widely used in web content management. It is also popular to describe Enterprise Information Systems. The term is seriously overused, leading to lots of confusion.
The basic web portal is a web site (or a web page) that (primarily?) contains links to other sites (or pages).
Since almost every web page has hyperlinks, we need more than this.
A portal can contain content, but portal content is there as an introduction or lead-in to more similar information on another page.
We can now distinguish different kinds of web portals - Directory Portals (like Yahoo), News Portals (like Slashdot), Navigation Portals (a frameset that repeats throughout the site with nav links), and Dictionary/Lexical Portals (useful for glossaries).
Enterprise Information Portals are variously called Corporate Portals, Business Portals, or simply Enterprise Portals. Today these portals always have web front ends. The main portal may have navigation links that take you to sub-sites (or sub-portals) with similar navigation schemes but different looks for divisions of the enterprise.
A portal template, like any CMS template, contains blocks where content elements or content objects will be placed. These elements may be HTML fragments, text converted to HTML on the fly, XML converted to HTML by XSLT, news feeds including Javascript or XML RDF (called RSS for RDF Syndication Services), images, applets, Flash files, etc. Collectively, these content objects are sometime called portlets.
A portlet is any content object that appears in some block in a portal. It comes from applets and servlets (which might be providing the portlet content via a web service).
A portal template, like any CMS template, contains blocks where content elements or content objects will be placed. These elements may be HTML fragments, text converted to HTML on the fly, XML converted to HTML by XSLT, news feeds including Javascript or XML RDF (called RSS for RDF Syndication Services), images, applets, Flash files, etc. Collectively, these content objects are sometime called portlets.
Presentation is a general term for the layout and style of a document. It figures in the CMS mantra - "separate the presentation from the content."
Privileges or permissions control access to content. The "granularity" of privileges refers to how fine permissions can be divided. Privileges can be per user, or based on their Role. A privilege can be given to a Group of users, then users assigned to the group. It can be per page or even per content element. It can depend on the time. For example, a user may edit a page scheduled for a future date, but not change it once it is published.
Reuse (SCORM, sharing)
Reuse describes a content element that can appear more than one place in the site. Maintaining an elemant in one place insures uniformity and accuracy. Because style and layout are controlled by the enclosing template, the same information appears to integrate into each use. Note that a content element can be a complete templated object with many sub-elements, an RSS news feed, or simply an HTML or XML fragment from a database.
SCORM (Shareable Content Object Reference Model) is a proposed standard for reusing content objects in e-learning applications.
Resource Description Framework
RDF is the tool for adding semantics (meaning) to content objects in web pages. This meaning can be discovered by parsers and inference engines, so it goes beyond the added meaning provided by human understandable XML tags. It allows computer programs to discover the meaning. RDF is used for library catalogs, for large directories, media asset repositories, and for personal media collections like books, CDs, and digital images. RDF is most often sent between computers using XML as its syntax. XML began as a markup language (somewhere between HTML and SGML) but is now most often used as a data exchange standard.
Where XML has the architecture of a hierarchical tree, RDF consists of sets of triples each of which has the form of a Statement like Subject - Predicate - Object. Collections of RDF triples form an RDF Vocabulary and when they cover a field of knowledge they constitute an Ontology.
See also RDF Site Summary (RSS)
Repository is the general term for all the content being managed by a CMS. It can be stored in a file system, a relational database, or an object-oriented database designed for media assets and large multimedia objects and all the metadata associated with such objects.
RDF Site Summaries (RSS
RSS was created by Netscape as a method of exchanging syndicated news feeds. It is also known as Rich Site Summary and Really Simple Syndication by the Userland team, who have a competing set of RSS standards proposals. Netscape standards are RSS 0.91 and 1.0. Userland standards are 0.92 and 2.0.
RDF Site Summaries (RSS
A Sandbox is a special web site running a CMS that allows anyone to sign in and test drive. Some organizations may limit access to users who have previously registered and been given a special access name and password.
Generally the database behind a Sandbox site is reinitialized on a regular basis so new visitors do not have to look at doodlings left behind by previous guests.
A Sandbox is perhaps the most powerful way to experience a CMS short of having the company put up a special version of their CMS for your exclusive testing. Bob Boiko recommends that you ask a company to set up a demo CMS for your testing purposes.
Scheduling allows for content to be automatically added to or removed from a site based upon date. Content scheduled in the future can generally be edited by users with lower privileges than required for currently playing "live" content, which is only changed if errors are detected.
A Staging server is used in Content Management development to test (do QA - Quality Analysis) on content before it is deployed to a production or delivery server. It should have all the same characteristics as the delivery server, which is sometimes very difficult to achieve.
The Semantic Web is Tim Berners-Lee's description of web pages with XML/RDF tagging of their component parts. XML is the syntax used to exchange data and RDF provides the semantics (meaning).This allows computer programs to read a web page and make sense of its contents. Search engines of the future will be infinitely more powerful if they can understand (draw inferences about) the web page content.
Structure refers to the markup of content, which allows the identification of structural elements like the Head or Body of an HTML document, or a Heading or Paragraph. It also refers to sections of a document that can be tagged with semantic XML tags like
Plain text is said to be unstructured. Style markup like Bold or Font tags is not properly structure, but style information in Word documents is often used to guess at the underlying structure of a document in order to convert it to XML.
A Sub-site is a distinct section of a website with its own navigation system. Large organizations often have sub-sites for their divisions, their geographically separated offices, etc.
The main page of a website with distinct sub-sites is often regarded as a Portal. The most intuitive sub-sites use a navigation scheme consistent with that of the main site.
Syndicating content makes it accessible as a web service. Typically an RSS news feed is provided in the form of a specific hyperlink to be embedded in a page template. The resulting content object is delivered over the web from the content syndication server.
The resulting web page is said to be aggregating content. Note the potential confusion when companies in the syndication business are themselves described as news aggregators.
A Tagging refers to adding style information or semantic information to a piece of content with HTML or XML tag attributes or unique style or semantic tags. Tags are the delimited markup information that surrounds content, e.g., <title>The Title</title>. Most CM systems have created their own proprietary tag names and delimiters to indicate sites in a content template where a content element is to be embedded. (See Templates) The tag is then replaced by the actual content element for delivery as HTML, for example.
Modern XML-style can be variables or containers and can include attributes that are validated by an DTD or XSD schema document.
A Template carries the Presentation information, including Style and Positioning. The Template Editor lets you create layouts - usually an arrangement of blocks on the page.
Into these blocks go the content elements or content objects, which could be HTML or XML fragments, RSS news feeds and other remote server-driven applications (servlets), or client-side application programs (applets). Each block (or just a span) is indicated by delimiting tags, usually in the style of their application framework. ASP tags look like <%title%>, Blogger tags like <%BlogTitle%>, etc. The new trend is to use XML-style tags defined in a namespace, like <title>
The Template is sometimes described as a Portal in which case the content objects are sometimes called Portlets.
Versioning (Version Control)
Versioning keeps dated or serialized copies of all the different versions of a piece of content. This allows the Rollback of content to a previous version. It also provides archiving of content.
"Web Publishing" is another name for Content Management. Major features include scheduling content onto the web, searching all page files, infinite undo and backups, and archiving all pages to preserve institutional memory. Documents need not be HTML web pages, but today the majority of documents in a web-based publishing system are in HTML or XML formats.
P>
Top of Glossary
"Web Services" describes one web server (or even a smart client applet) getting information or an application program from a remote web application server. News Feeds (RSS) are examples of a web service.
Distributed applications put part of their business logic on remote servers where key data can be accessed locally. For example, a web service provider might publish current stock prices. A web service requestor looking for stock quotes can use UDDI (Universal Description, Discovery, and Integration) to retrieve the published interface (the public methods and properties of the remote web service) written in WSDL (Web Service Description Language).
Then it can use XML-RPC (Remote Procedure Call) or SOAP (Simple Object Application Protocol) to exchange information with the web service, most simply using the HTTP protocol.
Web services over HTTP have become wildly successful compared to previous attempts to build distributed applications using complex and platform dependent schemes like CORBA and D-COM.
A Wiki is a collection of web pages with novel linking structures. Each link is the name of a page in the Wiki, with a special capital-letter plus medial-cap syntax that makes it a "WikiWord."
The Wiki builds itself when pages are edited to include a WikiWord. The Wiki recognizes the WikiWord, looks in its database to find the page and add a hyperlink to the WikiWord when the HTML for the page is generated. If the page does not yet exist, it prepends a special question-mark hyperlink to the WikiWord. If a user clicks on the question mark, the Wiki creates a blank page and an editor for that page appears.
Every page has an "EditText" link. Anyone in the world can edit the page. Standard Wikis do not have versioning and rollback.
Standard Wiki pages contain no HTML. They are simple text fields in the database. All HTML is generated, turning WikiWords into hyperlinks, adding unique Wiki links like FrontPage, FindPage, LikePages, RoadMaps, RecentVisitors, RecentChanges, and creating BackLinks and TopicPages.
A Wiki feels like a glossary, but it is not, since the entries are special WikiWords.
It is a CMS. It manages the content in pages with its simple but powerful tools.
Workflow (Notification, Roles)
Workflow is the management of who exactly is working on a content element or template, what exactly they are doing, and when. The workflow reporting system sends messages to others working on a page, with details of actions taken. Different workers can have assigned Roles. Notification may be sent to the Roles rather than the individuals.
Typical roles are writers, copy editors, editors, illustrators, graphic artists, rights clearance managers, (multilingual) localizers, and publishers.
"WYSIWYG" is an acronym for What You See Is What You Get. It was introduced in the early 1980's to describe desktop-publishing programs where the layout on the graphical screen was a fair approximation of the printed version of the document.
