I really enjoyed listening to Addison Berry talk this afternoon at OSCON 2009. She has clearly thought a lot about the social aspects of getting folks to create and maintain documentation for Open Source projects. I learned quite a bit from her talk and hope to learn from the rest of you as we go along.
My own focus tends to be on technological solutions. Having used a variety of documentation tools, I think I have a notion of how to build a generalized framework for mechanized documentation that I would want to use. I'd love to get some feedback on my approach, however, based on others' desires. As Addison correctly points out, Open Source documentation needs to be a group effort; if my framework only pleases me, it won't get much action...
Looking at Wikipedia's Comparison of documentation generators page, it's clear that there are lots of tools around. However, I don't see any that are Open Source and handle (specifically, parse) a wide variety of languages. I'd like to address this problem eventually, even if my initial efforts are limited to Ruby.
Another failing, IMHO, is the fact that these tools create read-only results. If the reader wants to ask a question, make a comment, or correct a typo, s/he will typically need to grab a copy of the project's source code, find the right file, generate a patch, and submit it through a bug-tracking system. If Wikipedia required this level of commitment, it would have (deservedly) died on the vine...
So, at minimum, I'd like a tool that parses program source code and comments. It would also be nice if it did various kinds of code quality analysis, generated useful metrics, etc. Basically, anything that a computer can do in an unsupervised manner to gather information from the code base...
It should then feed both the original code and the derived information into a wiki, allowing humans to annotate and discuss the results, contribute their own observations, etc. Ideally, the system would be able to send this information back to the original projects.
I would also like the system to be able to integrate information from multiple sources. The typical Rails project, for example, might use a number of Gems and Plugins. Wouldn't it be nice if the user could specify which ones are of interest, causing the system to present a customized collection of APIs?
Finally, I'd like users to be able to use the wiki as a programmable report generator. If a user thinks of a new way to analyze or present information, s/he should be able to script it up and have the wiki generate the results.
If you find these ideas interesting, please take a look at my working notes on Ontiki, an experimental framework for mechanized documentation. In particular, look at PARSE, my initial use case. Then, let me know where I'm going off the Rails :).
-r
looks interesting
Your description above looks really interesting! So I took a peek through the Wiki and your initial use case. I'd be interested the Wiki information fleshed out to include the kinds of problems that Ontiki will solve (the use cases didn't give enough information on how Ontiki will differ from existing tools to solve real problems with no current solution). Please especially include information about pulling comments back into the "code." While this isn't a problem for every project, it definitely is for a lot of projects.
features and comment handling
I'll try to fill in the wiki a bit more, but here's an immediate (if brief) response...
In PARSE (Punish All Ruby Software Equally), Ontiki will be gathering information from a variety of sources: RubyGems metadata, YARD and MetricFu output, etc. It will then make all of this information available for use by the front-end wiki pages, to be organized and presented as desired.
This approach addresses several problems found in existing systems. The documentation generators I am aware of pull information from the source code and/or comments. By providing for other information sources, PARSE can gather a more complete picture.
Also, existing generators tend to have certain "built-in" report formats. By allowing report generation to be specified (in a Turing-complete manner) at presentation time, Ontiki allows users to generate their own reports, based on their own needs.
The question of pulling comments back is a thorny one. I can make information available to developers, but I cannot compel them to make use of it. Worse, each Open Source repository (and often, project) has its own way to accept bug reports, comments, etc. (See http://www.cfcl.com/rdm/weblog/archives/001685.html for a small rant on the subject. :)
So, my plan is to try to "play nicely" with some of the more common ways that projects accept feedback. For example, it's quite possible that Ontiki could create and submit bug reports, including patches, to bugzilla, trac, etc. Failing that, I'll probably resort to email notification.
In closing, I'd like to emphasize that I don't see either PARSE or Ontiki as a prescription for how everyone should handle these sorts of problems. It's more like an experiment, designed to explore my own notions ...
Some pointers
Just found this thread. I haven't had a chance to read the Ontiki info yet. But I wanted to point out some projects that appear to have similar goals, and may be sources of ideas and inspiration.
You may have missed it in the doc generator comparison, but Doxygen is GPL and supports quite a number of languages, though not Ruby AFAIK.
pydocweb is Python-specific, but it provides wiki-based editing of API documentation, where the wiki-edits can be merged back into the doc comments in the source code. It is being used in collaborative efforts to update the API docs for NumPy.
I think your tool will get a lot more uptake if you make it easy to plug in new generators for different output formats. One of the nifty things about Sphinx (which is more general-purpose than an API doc generator, though it can be used that way) is that you can add a new output type by creating a new "builder" class. AFAIK, adding an output type to a tool like Doxygen is not so simple.
As far as doing "unsupervised" data collection from the codebase, that is very much what Ohloh's automated analysis does for a wide variety of open source projects.