Programmation appliquée en Scala

XML Literals

Simply use XML notation in Scala:

val head = <head><title>Fred's Memoirs</title></head>

Has type scala.xml.Elem

Can have sequence of nodes
```
val items = <li>Fred</li><li>Wilma</li>
```
Is a scala.xml.NodeSeq
Caution: Sometimes, the compiler thinks you use XML when you don't intend to:
```
x <y
```
Note: A future version of Scala will use xml"..." instead.

XML Nodes

scala.xml.Node superclass of all XML node types
Most important subclasses: Elem and Text
If e is an Elem, e.label is the tag name
e.child is the NodeSeq of children (it's child, not children, like in XPath)
NodeSeq is a subtype of Seq[Node] with support for XPath-like ops
Caution: Node extends NodeSeq as a sequence of length 1

Can assemble nodes in NodeBuffer

val items = new scala.xml.NodeBuffer
items += <li>Fred</li>
items += <li>Wilma</li>

Caution: When NodeBuffer is converted to NodeSeq, don't mutate it any more!

XML Nodes

Attributes

image.attributes gets a map-like key/value association

val image = <img src="logo.jpg" alt="San Jos&eacute; State University Logo"/>
val src = image.attributes("src")

The result is a NodeSeq because it may contain EntityRef
If not an issue for you, use image.attributes.toAttrMap, which yields a Map[String, String]

Embedded Expressions

Include Scala inside XML:

<ul><li>{items(0)}</li><li>{items(1)}</li></ul>

NodeSeq are spliced into the XML. Everything else wrapped into Atom[T]
Caution: String becomes Atom[String], not Text
The nested Scala code can again contain XML nodes:
```
<ul>{items.map(i => <li>{i}</li>)}</ul>
```

Note: To place { } inside XML, double them:

<h1>The Natural Numbers {{1, 2, 3, ...}}</h1>

Can also have expressions in attributes:
```
<img src={makeURL(fileName)}/>
```
Caution: Do not use quotes:
```
<img src="{makeURL(fileName)}"/>
```
does not execute the expression in { ... }

What is This?

<table>
  {(1 to 10) map (i =>
  <tr>
    {(1 to 10) map (j => <td>{ i * j }</td>)}
  </tr>)}
</table>

A NodeSeq whose child contains 100 elements
An Elem whose child is a NodeSeq of length 10
An Elem whose child is a NodeSeq of length 100
An Elem whose descendants include 100 Text nodes

XPath-Like Expressions

Can use \ and \\ with NodeSeq like / and // in XPath

val list = <dl><dt>Java</dt><dd>Gosling</dd>
  <dt>Scala</dt><dd>Odersky</dd></dl>
val languages = list \ "dt"
  // Yields a NodeSeq of <dt>Java</dt> and <dt>Scala</dt>

Wildcard locates any element:
```
doc \ "body" \ "_" \ "li"
```
\\ locates descendants at any depth:
```
doc \\ "img"
```
String starting with @ locates attributes:
```
img \ "@alt"
```

What is This?

(<img src="hamster.jpg"/><img src="frog.jpg"/> \\ "@src").text

"hamster.jpgfrog.jpg"
<src>hamster.jpg</src><src>frog.jpg</src>
NodeSeq("hamster.jpg", "frog.jpg")
Something else

Pattern Matching

Can use XML literals in match expressions:
```
n match {
  case <img/> => ...
}
```
Matches a img with any attributes and no children
Match a single child:
```
  case <li>{_}</li> => ...
```
Match children:
```
  case <li>{_*}</li> => ...
```

Bind children to variables:

case <li>{child}</li> => child.text
case <li>{Text(item)}</li> => item
case <li>{children @ _*}</li> => children.map(...)

Caution: In the last example, children is a Seq[Node] and not a NodeSeq

Transforming XML

Rewrite tree to edit elements

Change all ul to ol:

import scala.xml._
import scala.xml.transform._
val rule1 = new RewriteRule {
  override def transform(n: Node) = n match {
    case e @ <ul>{_*}</ul> => e.asInstanceOf[Elem].copy(label = "ol")
    case _ => n
  }
}

Provide one or more rules to RuleTransformer:

val transformed = new RuleTransformer(rule1, rule2, rule3).transform(root);

copy makes a copy of an Elem, modifying the tag or children:
```
list.copy(child = list.child ++ <li>Another item</li>)
```

Use % to change an attribute:

image % Attribute(null, "alt", "An image of a hamster", Null)

Loading and Saving XML

Loading with standard SAX parser:

val root = scala.xml.XML.loadFile("myfile.xml")

ConstructingParser preserves comments, CDATA sections, and optionally white space:

import scala.xml.parsing._
val parser = ConstructingParser.fromFile(new File("myfile.xml"), preserveWS = true)
val root = parser.document.docElem

And yet another parser for XHTML:

val parser = new XhtmlParser(scala.io.Source.fromFile("myfile.html"))
val doc = parser.initialize.document

Saving:
```
XML.save("myfile.xml", root)
XML.write(writer, root)
```
Each has different options—check the scaladoc

Lab

Scary looking lab

You work with a buddy
One of you (the coder) writes the code, the other (the scribe) types up answers
When you get stuck, ask your buddy first!
Switch coder/scribe roles each lab
The coder submits the worksheet. Include the scribe's name in the worksheet!
The scribe submits answers. Include the coder's name in the report!

Part 1: Analyzing XML

In order to process XML, you have to add a library to your project. Make a project unit11. Right-click on the project in the tree display at the left, select Properties → Java Build Path → Libraries → Add External JARs, then locate to the directory where the Scala IDE is installed, go to its plugins directory, and pick org.scala-lang.modules.scala-xml_1.0.2.jar.
Then restart the Scala IDE.

Make a worksheet images.sc. Now read a web page:

import scala.xml._
import scala.xml.parsing._
val source = scala.io.Source.fromURL("http://horstmann.com/index.html")
val parser = new XhtmlParser(source)
val doc = parser.initialize.document

What do you get?

Using \\, find all img tags. How many are they?
Make a sequence of their src attributes. (Hint: \\ returns a NodeSeq.)
Repeat with the site http://heig-vd.ch. Why doesn't it work?
Repeat with the site http://www.yverdon-les-bains.ch/accueil/. How many images do you get?
Where does the image with source null come from? (If you can't find it, don't worry—there will be a homework assignment to help.)

Part 2: Generating XML

When you need to enter a lot of quizzes, you'd like to use a simple format, like this:
```
What did the loop of the preceding problem do?

Print all characters in the string s
*Print the string s in reverse order
Print every other character in the string s
Count the number of characters in s
```
You'd like them to look like this:
What did the loop of the preceding problem do?
1. Print all characters in the string s
2. Print the string s in reverse order
3. Print every other character in the string s
4. Count the number of characters in s
Check out the HTML of this slide to see how this is done. What class and onclick attributes need to be put where?
Now download the following files:
- mc.js
- mc.css
- box.png, box_cross.png, box_tick.png
Also check out the script and link tags at the beginning of these slides to see how the JS/CSS are included.
Put a file quiz.txt into some known location, such as ~/Downloads with the sample question above.

In the unit11 project, make a worksheet quizwriter.sc:

import scala.xml._

val source = scala.io.Source.fromFile("~/Downloads/quiz.txt", "UTF-8")
val lines = source.getLines.toVector
val i = lines.lastIndexWhere(_.trim.length == 0)
val questionText = lines.take(i).map(l => <p>{l}</p>)
val doc = <html>...</html>

Now get the question choices to work.
Don't forget the heading.
When you are done, save your work:
```
XML.save("~/Downloads/quiz.html", doc)
```
Look at the file in the browser. Does it work?
Did you ever generate XML before? How did you do it? Was it easier/harder?

Part 3: Transforming XML

Up to HTML 4, headings were more decorative than structural. You were supposed to write
```
<body>
  <h1>Main heading</h1>
  Introduction
  <h2>Subheading</h2>
  Contents
  <h2>Another Subheading</h2>
  More Contents
</body>
```
But the contents of a subsection was not nested in the contents of a section, unless you chose to include it in a div. HTML 5 suggests that you use the section element to mark the semantics:
```
<body>
  <h1>Main heading</h1>  
  Introduction
  <section>
    <h2>Subheading</h2>
    Contents
  </section>
  <section>
    <h2>Another Subheading</h2>
    More Contents
  </section>
</body>
```
We'd like to achieve that transformation. Why is it harder than the example of replacing all ul with ol?
Look at the child node sequence of body. We could loop over it, but that wouldn't be the functional way. Instead, we'd like to group it into a sequence of sequences:
- the elements before the h2
- the elements from the first h2 up to, but not including, the second
- ...
- the elements from the last h2 to the end
Then we can map over the sequence of sequences, and wrap each of them (except the first) in a section.
It seems as there ought to be some standard function to do that. Look over all methods of Seq. Which ones produce a sequence, iterator, or map of sequences?

None of those will work for us, though. What we want is a function

def splitWhere[T](xs: Seq[T])(p: (T) => Boolean): Seq[Seq[T]]

so we can call

val splits = splitWhere(body.child)(...)
val intro = splits(0)
val section1 = <section>{splits(1)}</section>

What do you need for the ...?

Here is an icky implementation of splitWhere for testing. Don't code like that! You'll get to produce a nicer version in the homework.

def splitWhere[T](xs: Seq[T])(p: (T) => Boolean): Seq[Seq[T]] = {
  	// Don't code like this!
  	val result = new scala.collection.mutable.ArrayBuffer[scala.collection.mutable.ArrayBuffer[T]]()
  	result += new scala.collection.mutable.ArrayBuffer[T]()
  	for (x <- xs) {
  		if (p(x)) result += new scala.collection.mutable.ArrayBuffer[T]()
  		result.last += x
  	}
  	result
  }

You'll also need to figure out how to get to the body. Remember that doc \\ "body" is a NodeSeq.

Now produce a rewritten body with the intro followed by the sections
```
val sections = splits.tail.map(s => ...)
val newBody = ...
```
Now replace the body with the new body, using a transformer. Then write out the result.