File contents
1. OpenACS Title Slide
2. What is a Web Community?
* Sites like slashdot.org, yahoo! groups, imdb.com, and even
amazon.com (has community features), blogspot.com
* Not web communities: most ecommerce sites, most sites advertising
a company, service or band.
* common features: bulletin boards (bboards), news, comments, user
submitted stories
* advantages: building a web community creates interest and
publicity in a sideways manner. Site is useful besides
advertising. Shares knowledge, reduces need for organization
to produce all the content.
* disadvantages: requires programmers and maintainers. Static sites
can be run w/ almost no thought besides some basic UI design and
the use of Dreamweaver etc.
Needs of a web community:
1. magnet content authored by experts
2. means of collaboration (bboard, comments, etc)
3. powerful facilities for browsing and searching both magnet
content and contributed content (site-wide search)
4. means of delegation of moderation (filters to block posters,
content rating)
#5. means of identifying members who are imposing an undue burden on
# the community and ways of changing their behavior and/or
# excluding them from the community without them realizing it (bozo-filter)
6. means of software extension by community members themselves
(open source)
3. Who wrote OpenACS, who uses it, and why is it open source?
* Started by arsDigita, later taken over by the OpenACS gang.
Used by:
* Development Gateway (WorldBank) www.developmentgateway.org - ACS
* Knowledge Management System for Siemens
Corporation. (intranet application) - OpenACS/ACS hybrid
* Deutsche Bank Intranet - ACS
* site59.com: Last Minute Travel site (www.site59.com) - ACS
* scorecard.org: Environmental site which at one point served
30 db-backed page hits a second on an old Sun
Pizza Box (Sun UltraSparc II) proto-ACS
* photo.net: Community Site for camera enthusiasts serves
hundreds of thousands of hits a day.
(www.photo.net) - ACS
* Software companies make most of their money via services not
licenses. In the web world this is especially the case. Reduces
development costs, gains free publicity, gains free bug fixes and
packages.
4. History of OpenACS
* Philip Greenspun, Ben Adida, and crew wrote a website for Hearst Publications
in the mid-90s.
* Used Illustra database, moved to ORACLE as ORACLE was a much better database.
* Philip founded aD to build and market the ACS
* in the process of building aD convinced AOL to open source AOLserver
* PostgreSQL came out and was a full featured open source database
* aD gets VC money
* Ben Adida and some others started to port the ACS to PostgreSQL
to make it built entirely on an open source platform.
* aD decides to totally rewrite the ACS
* 4.0 is released in mid-2000
* arsDigita decides to become more "market savvy" and move away from TCL
to Java
* VC appointed CEO starts to run company like a dot-com
* Philip tries to take back over the company
* OpenACS crew ports ACS 4.0
* VC's spend most of remaining capital paying off Philip
* aD goes under and is bought out by RedHat
* OpenACS is used by many small OSS companies to work on lots of
projects, one of two or three major community systems (OSS).
6. OpenACS 3-tiered Architecture (Diagram)
Browser <- -> WebServer <- -> Database (Data Model & Storage)
viewer application logic
**** Outline general use cases .. multiple users accessing website at same time.
6. What is a database?
* Method of storing, organizing and rapidly retrieving data
* Robust to multiple writes and reads at the same time
* Through the '70s mostly hierarchical databases (file-system on steroids)
* HDBS were not robust to changing data models
* Born the relational database
* Basically a bunch of spreadsheets (columns and rows) with a declarative
language (SQL) used to retrieve the data
7. Responsibilities of Postgres
+
8. PostGres vs. File System -- ACID Fundamentals
* ACID section
* efficient retrieval of data (Million row file, searching for one
row, compounded when crossed w/ another million row file to coordinate
the search) indexes
* event listeners (triggers)
* good system for coordinating data retrieval (joins)
store information about the user in one table
store information about the user's purchases in another table
easily find out who bought pants on Oct-21st
* more overhead on writes and reads
maintaining indices etc.
* embedded procedural language for performing common tasks inside
the database.
9. Webserver Layer
* what is a webserver
* HTTP .. simple open protocol for Client-Server
* anatomy of a standard page
* some static, some dynamic, some database dynamic content.
10. AolServer vs. Apache
* why aolserver was used instead of apache
12. TCL -> why TCL?
* Toy language
* weak on data structures (only the list and associative array)
* not buzzword compliant
* weaker on heavy infrastructure if not used carefully
* slow
* turing complete
* satisfies 90% of website's needs (Vignette storyserver uses it too and they
charge 10's of thousands of $'s -- used to)
* rapid development .. can develop sites in much less time then Java
* on the web everything is a string .. but your fundamental data isn't
13. AOLServer Native Services
1. Database API & pooling
* ns_db api
* pooling vs. new connections
* no database swamping
2. Filters
* violates one URL - one file
* can be used for authorization or redirection
* invisible to developers so can stack 3 million of them
slowing requests and not realize it.
3. Templating
* ADP's to mix TCL and HTML code
* scares HTML-monkeys
4. Connection API
* Unified way to get basic information about requests and
the client. Only based on client .. not on information special
to the system.
ns_conn url
14. Why they are insufficient?
15. 3.x vs. 4.x
* Flat structure (use examples from documents)
* good for single look feel websites, monolithic structure
* everything installed in one batch ..
* services tended not to be autonomous ..
* pile of code .. not well designed
vs.
* strong on infrastructure
* packages allow real separation of functionality, tendency to design
more reusable components
* didn't have to install everything
* good for monolithic and multi-purpose deployment
Sections:
Database Services
1. Data Model (Compared to Vignette)
* Vignette had some basic utilities and a v. basic data model which
was insufficient for building a Vignette site. You ended up having
to write a lot of your data model while building on it.
* OACS has strong data model for site-wide services. Data
modelling is a major portion of site-design. Data model tested
in a wide variety of situations so it tends to be pretty robust.
* Data model is easily extensible .. the integration w/ the database
is tight so it is easy to optimize. (See database independence)
2. Database API (modifications, example of advantage of TCL .. show Java code)
* db_1row
3. Basic Object Interface
* All things which require site-wide services are an extension of
ACS_OBJECTS
*
4. Database Functions
5. "Database Independence"
6. XQL
Website Structure
0. 1 URL = 1 File
1. Packages (directory structure slide)
2. Package Instances
3. Site Map & Site Nodes
Request Processor
0. Why it exists
1. Anatomy of a request
2. How it handles a request
3. Templating (SLIDE?)
3a. ad_page_contract, adp's vs. html::template
* looping
* conditional logic (if-then)
* includes
* reverse-includes (master)
4. Subsites
Permissions:
1. Problem defined
1aa. users, objects, privileges
1ab. Users and Persons
3. users, parties, groups
4. contexts
5. API .. what it gives you
6. Utter Failure
* doesn't scale
* doesn't meet needs
Security
1. Basic Problem / Security Scenarios
1. Packet Sniffer
2. Left computer on (browser history, showing on screen, etc.)
3. Hacker/Defecting DB Admin
2. HTTP vs. HTTPS
2a. ad_secure_conn_p
2b. HTTP authorization code is insecure
3. Passwords, emails, one-way hashes
4. Authorization/Authentication
5. How to steal an identity
6. Always check your passwords
7. Don't store data
8. 2 signs that a website should not be trusted ..
Self-Documenting Server:
1. ad_proc, ad_library, ad_whatever
ad_proc -flags (which are pretty meaningless last time i checked) { args } {
javadoc style @info
} {
code ....
}
ad_library {
javadoc style @info
}
stores data in memory array and you can read the documentation through
the /doc interface.
A Typical ACS Page:
1. Database hits
2. ad_page_contract
3. template
Cache (Poorly Done problem):
1. Memory Caching
* It's fast, w/ AOLServer it is easy to share information between
threads since there share a memory address space.
* Causes memory usage to increase, if caches are commonly used and
never purged they may result in RAM being used up and then going
to SWAP space which slows down every action on the system.
* In a multiple front end server environment there may be cache
inconsistency. There is no efficient mechanism to update the
caches on each of the servers. Someone may reload the same
page 4 times and see 4 different results.
* Cache does not persist between server reboots (depending on
stability of system this may not be a major concern but
wait until you are slashdotted).
2. Database Caching
* Works between multiple front ends.
* Consistent between reboots.
* More expensive to write and read.
* With a massive # of front ends with replicated databases you
will have cache inconsistency again.
3. Squid Caching
* Great for mostly static content
* SQUID can act as a proxy/load balancer and can cache oft requested
pages which don't change in memory and not even forward the
requests to the webservers.
* Tiny variations like, "Welcome Tristan" instead of "Welcome Armen"
can stop the page from being cached.
4. Amazon/Google-Style redirect caching
* Probably the best solution in massive deployments.
* A user is redirected to the same server over and over again.
* Google has special indexes based on search terms so you are always
directed to a machine which is specially tuned for your search
criterion.
* All the advantages of memory caching and squid caching without the
problems of cache inconsistency.
* Resolve memory leaks by having the cache flush old unused data.
5. Since OpenACS was designed for deployment with one to a couple of
front ends in mind it focuses on memory caching. util_memoize
stores data in a set of key value pairs with a timestamp. the
oldest data is flushed as memory usage grows above a certain
amount. Database caching is easy to implement.
5. OpenACS vs. Zope vs. Roll Your Own
* OACS - Tightly integrated w/ the database
* Zope - Uses custom object database for many parts, can also run
on top of a RDBMS.
--
* OACS - standard site-wide method of handling users, permissions,
site-wide search, templating, packaging, site-maps
* Zope - ditto .. may run into trouble when concepts need to exist
in two places .. like users.
--
* OACS - most work done in editor of choice on top of OS of choice
* Zope - lots of work done in browser interface ..
--
* OACS - non-simplistic install, highly customizable
* Zope - easy to install, less obviously customizable
--
* OACS - depending on level of customization upgrading may be painful if it
involves changes to the database
* Zope - Probably easier to upgrade
--
* OACS - TCL, weak on data structures, simple to learn and
implement in, lots of custom constructs inside of OACS
designed to accelerate development.
* Zope - Python, strong on data structures, excellent language ..
DTML, semi-programming language w/ HTML-like syntax
Python is famous for being a compact and simple language,
in it's documentation Zope proudly (and
prob. incorrectly) indicates that it ignores the benefits
of Python.