Scope of this document
Forges are software-project hosting systems. In order to support cross-forge migration of projects, a common representation and ontology of project states will be required. This document informally constructs such an antology and describes how the representations of different forge systems map to it.
The ontology described here is not yet complete. It is being developed in conjunction with forgeplucker, which is as yet only capable of extracting tracker state from forges.
To motivate this, it is worth bearing in mind the forgeplucker goal: extraction of a complete enough sate that it could in principle be uploaded into another forge site and create a running project.
CFO, the Common Forge Ontology
Ontology terms are in all caps. Terms suffixed with ?? are completely. Terms suffixed with ? have unresolved issues/ambiguities or are not widely implemently per spec. Comma-separated lists enclosed in [] are controlled vocabularies.
The CFO is divided into JSON objects each with a CLASS attribute. They are listed below
IDENTITY objects ?
An IDENTITY object identifies a person. An IDENTITY has the following parts:
- NICK
-
Nickname of a developer, usurally the person part of an email address
-
String that is the fully qualified email address of a person
- NAME
-
String that is the name of a person.
The NICK may be the special value null representing an anonymous or unspecified user, in which case the NAME and EMAIL parts are not used.
What to do when only part of the IDENTITY object is avalible is undecided.
PULL-INTERVAL OBJECTS
A PULL-INTERVAL object represents the time interval associated with the capture of project state or some portion of it. The capture is guaranteed to contain all changes prior to the capture start date, and no changes after the capture end date.
A PULL-INTERVAL has the following parts:
-
CAPTURE-START:: Timestamp, when the capture of the parent object began
-
CAPTURE-END:: Timestamp, when the capture of the parent object completed
RELEASE objects and subobjects ??
A RELEASE is product files and metadata describing a software release. It has the foilowing parts:
-
ID:: A string naming the release
-
DATE:: Date of the release
-
DESCRIPTION:: Text of release notes
-
PRODUCT_LIST:: One or more PRODUCT objects
A PRODUCT is a shippable form of a release, or release component. Possible PRODUCTs include source tarballs, RPMs, Debian packages, or SHA1 checksum files. PRODUCT objects have the following parts:
-
FILENAME:: Product filename
-
DESCRIPTION:: Optional one-line description
-
MIMETYPE:: MIME type
-
ARCHITECTURE:: Tag for the product’s target architecture
-
TYPE:: Controlled vocabulary describing the project resource type
PROJECT-STATE OBJECTS ??
A small subset of this is currently implemented as PROJECT objects.
A PROJECT-STATE object is intended to capture the entire state of a project for export and re-import.
A PROJECT-STATE has the following parts:
- SHORTNAME
-
a project identification string that is a legal Unix path segment.
- LONGNAME
-
a name for human use.
- DESCRIPTION
-
a text description of the project.
- PRIVILEGE-LIST
-
List of PRIVILEGE objects with the following parts.
-
IDENTITY:: Identity of a project number; may not be Anonymous
-
CAPABILITY-LIST:: Zero or more CAPABILITY tokens.
-
TITLE:: String describing the member’s project role
-
- SUBSYSTEM-LIST
-
One or more SUBSYSTEM parts, which may be one of the following:
-
REPOSITORY with the following parts.
-
NAME:: a string identifying the repository.
-
TYPE:: a string identifying the version control system.
-
CONTENT:: a file or directory.
-
-
LISTS:: Zero or more MAILING-LIST?? objects
-
WEBSPACE:: a directory.
-
RELEASE-LIST:: Zero or more RELEASE objects.
-
TRACKER:: Refer to the TRACKER definition below.
-
CAPTURED:: A PULL-INTERVAL for the project state capture.
-
Most forges limit projects to one REPOSITORY and one WEBSPACE. This ontology supports more than one repository because that may be useful for modeling repo conversions.
A CAPABILITY consists of any of the following:
-
ADMINISTRATOR:: can modify capabilities (boolean)
-
TRUSTED:: can examine tracker items marked private (boolean)
-
A TRACKER-PERMISSION::
-
ID:: The ID of the tracker
-
TECHNICIAN:: can the user be assigned artifacts from here (boolean, optional)
-
MANAGER:: can the user modify artifacts on here (boolean, optional)
-
TRACKER objects and subobjects
The TRACKER object is intended to contain the entire state of the project’s issue trackers. It has several subobject types. All but ARTIFACT, the central and most complex one, are described here. ARTIFACT is described in the next section.
In the descriptions below, a nick is the site-local username of a project developer.
A TRACKER has the following parts:
- CAPTURED
-
A PULL-INTERVAL for the tracker capture.
- ARTIFACT-LIST
-
One or more ARTIFACT objects. Refer to the ARTIFACT definition below
- VOCABULARIES
-
A map from ARTIFACT slots to lists of possible string values
- SUBSCRIBERS??
-
Map of types to lists of SUBSCRIBER objects describing persons to be notified when an artifact with that type is added to the tracker.
A COMMENT has the collowing parts:
- SUBMITTER?
-
IDENTITY, the submitting user. Sourceforge and Trac use a plain nickname instead.
- DATE
-
A timestamp.
- TEXT
-
Comment text.
A FIELDCHANGE has the following parts:
Reverting a change is a simple as replacing new with old in the relevent field of ARTIFACT.
- BY
-
Nick of developer making a change to field value (not an IDENTITY)
- DATE
-
A timestamp.
- FIELD
-
An ARTIFACT fieldname
- OLD
-
The value before the change
- NEW
-
The value after the change
An ATTACHMENT has the following parts:
- URL
-
Where the file can be fetched
- FILENAME
-
A string, the name of the file as attached (optional)
- DATE
-
Date of attachment (optional)
- ID
-
A string that is a file attachment ID, conventionally numeric
- BY
-
IDENTITY
- DESCRIPTION
-
One-line description of file contents (optional)
A SUBSCRIBER has the following parts:
- CC-TO
-
Email address
- COMMENT
-
Why this person is subscribed
ARTIFACT objects
The ARTIFACT object is the most complex and variable one in the entire ontology. It is described by several tables.
The Internal Fieldname table maps slots into the ontology to the form attributes used on different forge system. The Savane column describes both Savane and Savane-Cleanup. A - value means the slot is unsupported on that system. If the entry is Unnamed, the attribute is supported but not editable in the artifact detail form, and must be scraped from presentation-level HTML rather than parsing INPUT elements. Otherwise the entry names the form parameter that names this slot.
CFO field | Berlios | SourceForge | Savane |
---|---|---|---|
ATTACHMENTS | - | Unnamed | Unnamed |
ASSIGNED-TO | assigned_to | assigned_to | assigned_to |
GROUP | bug_group_id | bug_group_id | bug_group_id |
CATEGORY | category_id | category_id | category_id |
COMMENTS | Unnamed | Unnamed | Unnamed |
VERSION? | - | - | category_version* |
DATE | Unnamed | Unnamed | Unnamed |
DEPENDENTS?? | Unnamed | Unnamed | Unnamed |
DISCUSSION-LOCK | - | - | discussion_lock |
EFFORT?? | - | - | effort |
FIX-RELEASE?? | - | - | fix_release* |
HISTORY | Unnamed | Unnamed | Unnamed |
ID | Unnamed | Unnamed | Unnamed |
KEYWORDS | - | - | keywords |
ORIG_EMAIL?? | - | - | Unnamed |
ORIG_NAME?? | - | - | originator_name |
ORIG_PHONE?? | - | - | originator_phone |
PERCENT | - | - | percent_complete |
PLAN-RELEASE | - | - | plan_release* |
PLATFORM | - | - | platform_version_id |
PRIORITY | priority | priority | priority |
PRIVACY | - | - | privacy |
RELEASE | - | - | release* |
REPRODUCIBILITY | - | - | reproducibility_id |
RESOLUTION | resolution_id | resolution_id | resolution_id |
SEVERITY | - | - | severity |
SIZE | - | - | size_id |
STATUS | status_id | status_id | status_id |
SUBMITTER | Unnamed | Unnamed | Unnamed |
SUBSCRIBER | - | Unnamed | Unnamed |
SUMMARY | Unnamed | Unnamed | summary |
VOTES? | - | - | Unnamed |
Under Savane, slots with starred field names can be defined in one of two ways: as a free-text field. or as a select box with an administrator-defined controlled vocabulary. In the latter case, the fieldname gets an _id suffix.
The Intended Semantics table describes the intended semantics of each slot.
CFO field | Meaning |
---|---|
ATTACHMENT-LIST | List of ATTACHMENT objects |
ASSIGNED-TO | Nick of the developer responsible |
BUG-GROUP | Controlled by administrator |
CATEGORY | Controlled by administrator |
CC-LIST | List of subscribers to be notified on changes |
COMMENT-LIST | User comments on the artifact |
VERSION? | Version of system component impacted by artifact |
DATE | Original submission date of the artifact |
DEPENDENTS | List of IDs of bugs that depend on this one |
DISCUSSION_LOCK | Whether or no discussion is closed on this bug |
EFFORT?? | Estimated number of hours to address |
FIX-RELEASE?? | Release in which the artifact was actually fixed |
HISTORY | Change history of the bug record |
ID | A string-valued bug ID, conventionally numeric in form. |
KEYWORDS | Comma-separated tags associated with the artifact |
ORIG_EMAIL?? | Originator’s email address (if different from Submitter) |
ORIG_NAME?? | Originator’s name (if different from Submitter) |
ORIG_PHONE?? | Originator’s phone number |
PERCENT?? | Percentage completion of task |
PLAN-RELEASE | Planned release for this artifact to be |
PLATFORM | Platform (operating system) affected by artifact |
PRIORITY | How quickly the artifact should be handled |
PRIVACY | Public or private |
RELEASE | Release impacted by the artifact |
REPRODUCIBILITY | How reproducible is this bug? |
RESOLUTION | Resolution status of the bug |
SEVERITY | Impact on the system |
SIZE | Estimated size of code to be added or changed |
STATUS | Various statuses allowed for a bug (TODO: a full list should be produced) |
SUBMITTER | Identity of the bug submitter (string or IDENTITY) |
SUMMARY | One-line summary of the bug |
VOTES?? | User votes for this bug |
On Savane, the "PLATFORM" field is actually labeled "Operating System".
The Field Type table describes what kind of data each slot contains.
CFO field | Data type |
---|---|
ATTACHMENT-LIST | List of ATTACHMENT objects |
ASSIGNED-TO | IDENTITY |
BUG-GROUP | String |
CATEGORY | String |
CC-LIST | List of SUBSCRIBER objects |
COMMENT-LIST | One or more COMMENT objects |
COMP_VERSION | Controlled by administrator |
DATE | Timestamp |
DEPENDENTS | List of IDs |
DISCUSSION_LOCK? | Enumerated: Locked, Unlocked |
EFFORT | String interpreted as numeric |
FIX-RELEASE | Controlled or string |
HISTORY | List of FIELDCHANGE objects |
ID | Sequence number |
KEYWORDS | List of strings |
ORIG_EMAIL?? | String |
ORIG_NAME?? | String |
ORIG_PHONE?? | String |
PERCENT | String interpreted as numeric |
PLATFORM | Controlled by administrator |
PLAN-RELEASE | Controlled or string |
PRIORITY | Enumerated |
PRIVACY | Enumerated |
RELEASE | Controlled or string |
REPRODUCIBILITY | Enumerated |
RESOLUTION | Controlled by administrator |
SEVERITY | Enumerated |
SIZE | Enumerated |
STATUS | Controlled by administrator |
SUBMITTER | IDENTITY |
SUMMARY | String text |
VOTES | String interpreted as numeric |
The following tables describe the default controlled vocabularies associated with each slot. Starred fields can have possible values added by the adminstrator.
CFO field | Berlios | SourceForge | Savane |
---|---|---|---|
BUG-GROUP | None* | None* | None* |
CATEGORY | None* | None* | None* |
COMP-VERSION | - | - | None* |
PRIORITY | 1 - Lowest | 1 - Lowest | 1 - Lowest |
2 | 2 | 2 | |
3 | 3 | 3 | |
4 | 4 | 4 | |
5 - Medium | 5 - Medium | 5 - Medium | |
6 | 6 | 6 | |
7 | 7 | 7 | |
8 | 8 | 8 | |
9 - Highest | 9 - Highest | 9 - Highest | |
PERCENT | - | - | 0% |
10% | |||
20% | |||
30% | |||
40% | |||
50% | |||
60% | |||
70% | |||
80% | |||
90% | |||
100% | |||
PLATFORM | - | - | None* |
GNU/Linux | |||
Microsoft Windows | |||
*BSD | |||
MacOS | |||
PRIVACY | - | - | Public |
Private | |||
REPRODUCIBILITY | None | ||
Every Time | |||
Intermittent | |||
Once | |||
SEVERITY | - | - | 1 - Wish |
2 - Minor | |||
3 - Normal | |||
4 - Important | |||
5 - Blocker | |||
6 - Security | |||
SIZE | None | ||
Low <30 | |||
Medium 30-200 | |||
High >200 |
The PERCENT value is encoded as the numeric string 1. Other values are encoded without the percent.
CFO field | Berlios | SourceForge | Savane |
---|---|---|---|
STATUS | Open | Open | Open |
Closed | Closed | Closed | |
Deleted | |||
Pending | |||
RESOLUTION | None | None | None* |
Fixed | Accepted | Fixed | |
Invalid | Duplicate | Invalid | |
Wont Fix | Fixed | Wont Fix | |
Later | Invalid | Postponed | |
Remind | Later | In Progress | |
Works For Me | Out of Date | Works For Me | |
Duplicate | Postponed | Duplicate | |
Rejected | Confirmed | ||
Remind | Need Info | ||
Wont Fix | Ready For Test | ||
Works For Me |
CFO field | Berlios | SourceForge | Savane |
---|---|---|---|
STATUS | Open | Open | Open |
Closed | Closed | Closed | |
Deleted | Deleted | ||
Pending | |||
RESOLUTION | - | None | None* |
Accepted | Fixed | ||
Duplicate | Wont Fix | ||
Fixed | Works For Me | ||
Invalid | Ready For Test | ||
Later | In Progress | ||
Out of Date | Postponed | ||
Postponed | Confirmed | ||
Rejected | Need Info | ||
Remind | Duplicate | ||
Wont Fix | Invalid | ||
Works For Me |
CFO field | Berlios | SourceForge | Savane |
---|---|---|---|
STATUS | Open | Open | Open |
Closed | Closed | Closed | |
Deleted | Deleted | ||
Pending | |||
RESOLUTION | - | None | None* |
Accepted | Done | ||
Duplicate | Wont Do | ||
Fixed | Works For Me | ||
Invalid | Ready For Test | ||
Later | In Progress | ||
Out of Date | Postponed | ||
Postponed | Need Info | ||
Rejected | Duplicate | ||
Remind | Invalid | ||
Wont Fix | |||
Works For Me |
CFO field | Berlios | SourceForge | Savane |
---|---|---|---|
STATUS | Open | Open | Open |
Closed | Closed | Closed | |
Deleted | Deleted | ||
RESOLUTION | - | Pending | None* |
None | Done | ||
Accepted | Wont Do | ||
Duplicate | Works For Me | ||
Fixed | Ready For Test | ||
Invalid | In Progress | ||
Later | Confirmed | ||
Out of Date | Postponed | ||
Postponed | Need Info | ||
Rejected | Duplicate | ||
Remind | Invalid | ||
Wont Fix | |||
Works For Me |
CFO field | Berlios | SourceForge | Savane |
---|---|---|---|
STATUS | Open | - | Open |
Closed | - | Closed | |
RESOLUTION | - | - | None* |
Done | |||
Cancelled | |||
Works For Me | |||
Ready For Test | |||
In Progress | |||
Postponed | |||
Need Info |
Pending is a non-closed status in which development work is considered done, but some other work is needed (QA, documentation, etc.) The ideal workflow is Open → Pending → Closed. Of course, the real workflow is up to the developers.
Additional metadata not covered by the ontology
Under Berlios, SourceForge, and Savane, every tracker has a list of canned responses associated with it,
Berlios, SourceForge, and Savane all have a notion of "squads" (permission groups) which we don’t capture.
SourceForge trackers have a set of preferences that should probably be considered part of project state.
-
Whether or not to allow posting by anonymous users
-
Whether or not closed artifacts should be reopened when submitter attaches a new coment.
-
Timeouts associated with overdue and Pending status on SourceForge.
-
Canned texts to be shown on the artifact submission and tracker browse pages.
-
Hidden vs. visible preferences for all metadata other than ID and Summary.
Savane has Cookbook and News features we don’t capture.
Under Savane, every modifiable ARTIFACT field has some additional attributes:
-
Whether it is optional, mandatory, or mandatory only if it was presented to the submitter.
-
Who it is visible to (Project members, logged-in users, anonymous users)
-
Screen rank
-
Whether or not artifact changes are stored in the HISTORY list.
Under Savane, it is also possible to selectively allow or disallow various transitions in enumerated and controlled values.
Savane also allows the definition of up to 10 each of custom text areas and select boxes, and up to 5 customizable date fields. These have fieldnames of the form *_custom.
Incompatibilities
Berlios and SourceForge lack many artifact fields found Savane and SavaneCleanup.
The RESOLUTION slot is screen-labeled "Resolution" on Berlios, but "Status" on Savane. The STATUS slot is labeled "Status:" on Berlios, but "Open/Closed" on Savane. Trac doesn’t distinghish status from resolution.
Berlios and SourceForge have separate bug and feature trackers. Savane has no separate feature tracker; projects separate those by using the Bug Category or Bug Group field.
Under SourceForge, there is no predefined Task tracker, but it is possible to create custom trackers that don’t fit one of the named types.
Savane ATTACHMENT objects lack the DESCRIPTION slot, SourceForge ATTACHMENT objects lack MIMETYPE. Trac attachments lack MIMETYPE.
Berlios and SourceForge allow setting (in tracker preferences) the address of a commit list for new submissions, and its subscriptions have no explanation attached. Savane allows setting a CC list for notification on all state changes, and each subscription can have an explanatory comment. The CFO SUBSCRIBER slot assumes Savane-like behavior which Berlios, Trac and SourceForge cannot support.
Open Questions
We need a definition of MAILING-LIST.
What are the controlled vocabularies for PRODUCT::TYPE, PRODUCT::ARCHITECTURE?
There are other kinds of objects that might be considered part of a project state: blogs, wikis, and forums leap to mind. Which of these should we try to capture?
How much of the state described under "Additional metadata" should be brought into CFO? Is there a principled distinction between "state of the project" and "state of the forge"?
Formalizing the ontology
The PROJECT class is a pretty close match to DOAP. We’ll need an extension field cfo#captured.
The IDENTITY class can be represented by FOAF.
Appendix: Forge permissions models
These permissions models are documented here because CFO will have to capture at least some portions of them. But that has not yet been done.
Berlios
Each project member may be assigned a role. The role set is as follows:
-
Undefined
-
Developer
-
Project Manager
-
Unix Admin', u’Doc Writer
-
Tester
-
Support Manager
-
Graphic/Other Designer
-
Doc Translator
-
Editorial/Content Writer
-
Packager (.rpm, .deb etc)
-
Analysis / Design
-
Advisor/Mentor/Consultant
-
Distributor/Promoter
-
Content Management
-
Requirements Engineering
-
Web Designer
-
Porter (Cross Platform Devel.)
Each developer has two general permissions:
- Administrator
-
May edit project member permissions
- Release Technician
-
May create project releases
There are five trackers: bugs, tasks, support, patches, and features. For each tracker, a member may have either or both of the following privilege bits:
-
Technician - may be assigned issues on this tracker
-
Administrator - may modify issues on this tracker
There are two other flags per member:
-
The user may be enabled as a forum moderator
-
The user may be enabled as a document-manager editor
Mailing lists have separate permissions and passwords, being managed through Mailman.
Savane
There are no named developer roles as on Berlios.
Each developer has two general permissions:
- Administrator
-
May edit project member permissions
- Trusted User
-
May view private bugs (the interface does not use "Trusted")
There are four trackers: bugs, tasks, patches, and support. There are two special features: the cookbook mamnager and the news manager. For each feature, a member may have either or both of the following privilege bits:
-
Technician - may be assigned items on this feature
-
Manager - may modify items on this tracker (like Berlios Administrator)
Mailing lists have separate permissions and passwords, being managed through Mailman.
Change History
Version 1.0, 2009-10-24: Initial version.
Version 1.1, 2009-10-24: Add CC-LIST tracker slot and subscriber object. Add RELEASE and PRODUCT objects. Describe Pending status. ATTACHMENT object mutates some. CC-LIST changes to SUBSCRIBER. NICK member added to IDENTITY.
Version 1.2, 2009-11-14: Define CAPABILITY. Appendices on permissions models.
Version 1.3 2010-02-16: Changed for switch from trackers to type fields. Added ?? for things not yet implemented and ? for things poorly implemented.