To gather the data, we initally referred to phsyical newpaper articles, but we would have to manually type all of the data or attempt OCR. As an alternative, we found the articles published on the internet, and we simply copied the articles into our XML files.
This is where the fun began. We had to mark up each file with a system of tags meaning different things. The <who> tags would contain people's names while the attribute "ref" linked the person with a distinct pointer value. That "ref" attribute helped up link the person to their quotes because the <who> tags shared the same attribute with our <quote> tags. Some markup is below for example...
<body><sec><location>COLUMBUS, Ohio,</location> <date>June 18</date> — The <sect type="Episcopal">Episcopal</sect> Church elected <who ref="Schori">Bishop Katharine Jefferts Schori</who> of <location>Nevada</location> as its presiding bishop on Sunday, making her the first woman to lead a church in the worldwide <sect type ="Anglican">Anglican</sect> Communion.</sec> Many <sect type="Episcopal">Episcopalians</sect> gathered here for the church's triennial general convention cheered the largely unexpected choice of <sec><who ref="Schori">Bishop Jefferts Schori</who>, 52, the lone woman and one of the youngest of the seven candidates for the job.</sec> <sec>Her election was a milestone for the <sect type="Episcopal">Episcopal</sect> Church, which<fact> began ordaining women only in 1976</fact>.</sec> She takes on her new responsibilities at a particularly fraught moment in the history of the<sec> <sect type="Episcopal">Episcopal</sect> Church, the American branch of the <sect type="Anglican">Anglican</sect> Communion, the <fact>world's third-largest church body</fact>, with<stat> 77 million members</stat>. </sec>She was elected to succeed <sec>Presiding <who ref="Griswold">Bishop Frank T. Griswold</who></sec>, who will retire in November when his nine-year term ends.
To ensure we had all of our tags correct and where they were supposed to be, we created a Schema file to keep all of the documents unified. The schema can be found here.
To transform our data into something usable, we had to determine what we wanted to find. For example, finding the quotes in our articles involved a simple XPath expression, //quote. Finding a list of quotes wouldn't help us much, so we had to get creative with XPath and manual coding. After retrieving a list of quotes and speakers in the form of a table, I was able to copy and paste that into an XML file for additional transformation. In order to quickly transform the large sets of data, RegEx came in handy. Expressions such as: ^(.*)$ --> <li>\0</li> allowed us to quickly transform a table into a nested list of quotes. Most of the work was done manually to ensure that no mistakes were made. The parish locations were great pieces of data, but using the KML Encoder did not yield the results we were looking for. It is an amazing tool to make the graphs, but it gave incorrect results because some parishes had the same names as others in other states and even countries. To fix those errors, we manually searched through each result to ensure they were correct. The SVG graphs that we designed were done manually in order to create clean and consistent graphs. We retrieved the data through XQuery, and after having that data, we made each graph by hand to ensure they would fit into each <div> container in our webpage. We have provided a sample of our XQuery in a text document here. Another applicaion of XQuery we used was to to find the number of times each person was mentioned, and to do that, we searched for the @ref attribute matching the person and created a total count. Build off of the example we gave you, and see what you can find.
Bringing all of that information together was interesting. There is so much information in each article that we had to design a site to ensure that it could all fit. We adapted the "Spry Accordion" style of website design to create a clean format that allows you to control what you see. The JavaScript for our adapatation can be found here. The rest of the site was designed using <div> containers for an easy to read layout, and you can view the source code for the website by right clicking the screen and chosing to read the source. Using the <div> containers can give you interesting layouts. CSS Stylesheets were a great tool to use for the site's formatting, and our's can be found here. By giving each collection of data its own accordion panel, we created this site, and we encourage you to explore other options of site design.
Overall, this site wasn't constructed with a single language. In order to capture the XML marked up data, we had to use a variety of languages and styles to bring it all together. Whether it was simple XML or SVG or complex JavaScript, we managed to pull through and bring this into what it is. I hope you enjoyed the fruits of our labor.
Signed: Cassie Miller & Ryan Brooks