Programmation appliquée en Scala

Copyright © Cay S. Horstmann 2015 Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License

XML Literals

XML Nodes

XML Nodes

Attributes

Embedded Expressions

What is This?

<table>
  {(1 to 10) map (i =>
  <tr>
    {(1 to 10) map (j => <td>{ i * j }</td>)}
  </tr>)}
</table>
  1. A NodeSeq whose child contains 100 elements
  2. An Elem whose child is a NodeSeq of length 10
  3. An Elem whose child is a NodeSeq of length 100
  4. An Elem whose descendants include 100 Text nodes

XPath-Like Expressions

What is This?

(<img src="hamster.jpg"/><img src="frog.jpg"/> \\ "@src").text
  1. "hamster.jpgfrog.jpg"
  2. <src>hamster.jpg</src><src>frog.jpg</src>
  3. NodeSeq("hamster.jpg", "frog.jpg")
  4. Something else

Pattern Matching

Transforming XML

Loading and Saving XML

Lab

Scary looking lab

Part 1: Analyzing XML

  1. In order to process XML, you have to add a library to your project. Make a project unit11. Right-click on the project in the tree display at the left, select Properties → Java Build Path → Libraries → Add External JARs, then locate to the directory where the Scala IDE is installed, go to its plugins directory, and pick org.scala-lang.modules.scala-xml_1.0.2.jar.
  2. Then restart the Scala IDE.
  3. Make a worksheet images.sc. Now read a web page:
    import scala.xml._
    import scala.xml.parsing._
    val source = scala.io.Source.fromURL("http://horstmann.com/index.html")
    val parser = new XhtmlParser(source)
    val doc = parser.initialize.document
    
    What do you get?
  4. Using \\, find all img tags. How many are they?
  5. Make a sequence of their src attributes. (Hint: \\ returns a NodeSeq.)
  6. Repeat with the site http://heig-vd.ch. Why doesn't it work?
  7. Repeat with the site http://www.yverdon-les-bains.ch/accueil/. How many images do you get?
  8. Where does the image with source null come from? (If you can't find it, don't worry—there will be a homework assignment to help.)

Part 2: Generating XML

  1. When you need to enter a lot of quizzes, you'd like to use a simple format, like this:
    What did the loop of the preceding problem do?
    
    Print all characters in the string s
    *Print the string s in reverse order
    Print every other character in the string s
    Count the number of characters in s
    
    You'd like them to look like this:
    What did the loop of the preceding problem do?
    1. Print all characters in the string s
    2. Print the string s in reverse order
    3. Print every other character in the string s
    4. Count the number of characters in s
    Check out the HTML of this slide to see how this is done. What class and onclick attributes need to be put where?
  2. Now download the following files: Also check out the script and link tags at the beginning of these slides to see how the JS/CSS are included.
  3. Put a file quiz.txt into some known location, such as ~/Downloads with the sample question above.
  4. In the unit11 project, make a worksheet quizwriter.sc:
    import scala.xml._
    
    val source = scala.io.Source.fromFile("~/Downloads/quiz.txt", "UTF-8")
    val lines = source.getLines.toVector
    val i = lines.lastIndexWhere(_.trim.length == 0)
    val questionText = lines.take(i).map(l => <p>{l}</p>)
    val doc = <html>...</html>
    
  5. Now get the question choices to work.
  6. Don't forget the heading.
  7. When you are done, save your work:
    XML.save("~/Downloads/quiz.html", doc)
    Look at the file in the browser. Does it work?
  8. Did you ever generate XML before? How did you do it? Was it easier/harder?

Part 3: Transforming XML

  1. Up to HTML 4, headings were more decorative than structural. You were supposed to write
    <body>
      <h1>Main heading</h1>
      Introduction
      <h2>Subheading</h2>
      Contents
      <h2>Another Subheading</h2>
      More Contents
    </body>
    
    But the contents of a subsection was not nested in the contents of a section, unless you chose to include it in a div. HTML 5 suggests that you use the section element to mark the semantics:
    <body>
      <h1>Main heading</h1>  
      Introduction
      <section>
        <h2>Subheading</h2>
        Contents
      </section>
      <section>
        <h2>Another Subheading</h2>
        More Contents
      </section>
    </body>
    
    We'd like to achieve that transformation. Why is it harder than the example of replacing all ul with ol?
  2. Look at the child node sequence of body. We could loop over it, but that wouldn't be the functional way. Instead, we'd like to group it into a sequence of sequences: Then we can map over the sequence of sequences, and wrap each of them (except the first) in a section.
    It seems as there ought to be some standard function to do that. Look over all methods of Seq. Which ones produce a sequence, iterator, or map of sequences?
  3. None of those will work for us, though. What we want is a function
    def splitWhere[T](xs: Seq[T])(p: (T) => Boolean): Seq[Seq[T]] 
    
    so we can call
    val splits = splitWhere(body.child)(...)
    val intro = splits(0)
    val section1 = <section>{splits(1)}</section>
    What do you need for the ...?
  4. Here is an icky implementation of splitWhere for testing. Don't code like that! You'll get to produce a nicer version in the homework.
    def splitWhere[T](xs: Seq[T])(p: (T) => Boolean): Seq[Seq[T]] = {
      	// Don't code like this!
      	val result = new scala.collection.mutable.ArrayBuffer[scala.collection.mutable.ArrayBuffer[T]]()
      	result += new scala.collection.mutable.ArrayBuffer[T]()
      	for (x <- xs) {
      		if (p(x)) result += new scala.collection.mutable.ArrayBuffer[T]()
      		result.last += x
      	}
      	result
      }    
    You'll also need to figure out how to get to the body. Remember that doc \\ "body" is a NodeSeq.
  5. Now produce a rewritten body with the intro followed by the sections
    val sections = splits.tail.map(s => ...)
    val newBody = ...
  6. Now replace the body with the new body, using a transformer. Then write out the result.