Tuesday, March 16, 2010

Read and print lines from a URL - Revisited

And now that we know how to:

1. import Java classes
2. read and print a file using with-open and doseq

We can read and print lines from a URL just like we would read from a file.


; import the necessary classes
user=> (import '(java.io BufferedReader IOException InputStreamReader PushbackReader) java.net.URL)

; define a BufferedReader
user=> (def bufferedReader
(new BufferedReader
(new InputStreamReader
(. (new URL "http://www.yahoo.com") openStream)
)
)
)

; read and print one line at a time
user=> (with-open [rdr bufferedReader]
(doseq [line (line-seq rdr)] (println line))
)


; or do it all in one shot without separately defining a bufferedReader
user=> (with-open [rdr (new BufferedReader (new InputStreamReader (. (new URL "http://www.yahoo.com") openStream)))]
(doseq [line (line-seq rdr)] (println line))
)

; do it all in one shot, using the shortcut "." instead of "new"
user=> (with-open [rdr (BufferedReader. (InputStreamReader. (. (URL. "http://www.yahoo.com") openStream)))]
(doseq [line (line-seq rdr)] (println line))
)

Pretty elegant, but a little hard for me to understand the syntax - probably due to the conditioning effect of my background in imperative programming.

Anyway ... oh, but how about reading binary files? Oh my. Need to learn that first. We'll come back to that another time.

List files in a directory

; import the java.io.File class
(import java.io.File)

; get a list of files in the current directory - recursively, including subdirectories
(file-seq (File. "."))
(# # # # # # # # # # ... rest of the files ...)

; for each file in the current directory, print the file object
(doseq [file (file-seq (File. "."))] (println file))
... list of File objects ...
#
#
#
#
#
... rest of the files ...

(doseq [file (file-seq (File. "."))] (println (. file getName)))
... list of file names ...
Seqable.java
SeqEnumeration.java
SeqIterator.java
Sequential.java
Settable.java
... rest of the files ...

Monday, March 15, 2010

Read a file in clojure

Ok, I have tried quite a bit, and failed to figure out on my own how to read a file. I did come close to doing it, but the solution was throwing a NullPointerException, and the code was not LISPy.

Now, in most other languages, a quick search will yield tons of examples on how to read a file. But for clojure, the information is surprisingly hard to come by!

Luckily, after 3 days of trying to figure it out and looking on the web, I came upon a nice solution at Irrational Exuberance. Thank you, Will Larson, for the excellent page! I wish it would somehow show up a little earlier in a google search - woulda saved me a lot of time.

What I have learned from the above site is shown below.

It is easy (after you are shown how) to read an entire file into memory. Use the slurp function:

user=> (slurp "c:/clojure-1.1.0/.gitignore")
"classes/*\n*jar\npom.xml\nclojure.iws\nclojure.ipr\nnbproject/private/\n*.zip\ndist\n"

For larger files:

C:\clojure-1.1.0>java -cp clojure.jar clojure.main
Clojure 1.1.0
user=>
; import the necessary readers.
; set the namespace to "tokenize"
; Note: setting the namespace also changes the prompt
(ns tokenize
(:import (java.io BufferedReader FileReader)))
java.io.FileReader

tokenize=>
; and this is how its done.
; the line-seq function "Returns the lines of text from rdr as a lazy sequence of strings."
(with-open [rdr (BufferedReader. (FileReader. "c:/clojure-1.1.0/.gitignore"))]
(doseq [line (line-seq rdr)] (println line))
)
**********file contents***********
classes/*
*jar
pom.xml
clojure.iws
clojure.ipr
nbproject/private/
*.zip
dist
**********file contents***********
nil
; Control-Z (in Windows) to exit REPL
tokenize=> ^Z


C:\clojure-1.1.0>

Thursday, March 11, 2010

Read and print lines from a URL

My goal for the next few days is to understand the clojure programming style, and how to think about it. The Reader page has lots of basic information needed to use clojure effectively. The following code:

1. imports some Java classes,
2. defines a BufferedReader for a URL, and
3. uses the with-open function to read a couple of lines from the reader.
4. tries - and fails! - to write a loop to read the entire stream

It works. Kinda. I still haven't figured out how to detect EOF in the loop. I guess I could look it up, but I wouldn't learn very much that way.
Plus, its not good code yet - it does not have the lispy, functional feel to it at all. Once I have it working, I will request a code review in the clojure group.


; import necessary classes
(import java.net.URL)
(import '(java.io BufferedReader IOException InputStreamReader PushbackReader))

; define a BufferedReader
(def bufferedReader
(new BufferedReader
(new InputStreamReader
(. (new URL "http://www.yahoo.com") openStream)
)
)
)


; read and print 2 lines. 1st line is blank
user=> (with-open [rdr bufferedReader]
(println (.readLine rdr))(println (.readLine rdr)))

nil

; the following does read all the bytes. however it ends up throwing a
; NullPointerException when it tries to get the 1st char of a null value at EOF
; And I don't like the dirty comparison to -1 (Java null character). So lots of
; scope for improvement
; define bufferedReader again before running the following, or we
; get an end of stream error
(with-open [rdr bufferedReader]
(loop [line (.readLine rdr)]
(println line)
(recur
(if (not (= -1 (subs line 0))) (.readLine rdr)
)
)
)
)

; the substring function subs.
; remember the the indexes are always between characters.
; index 0 is before 'h'. index 1 is between 'h' and 'e'
user=> (subs "hello" 2)
"llo"
user=> (subs "hello" 2 3)
"l"
user=> (subs "hello" 1 3)
"el"

; read a line from a file
; works great! only i didn't write it. :) I got it from some web site.
(with-open [rdr (java.io.BufferedReader.(java.io.FileReader. "c:\\python31\\LICENSE.txt"))]
(println (.readLine rdr)))

Wednesday, March 10, 2010

Instantiate java classes, assign to variables

The goal is to read from a URL and print the returned html, much like this Java code:


import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

public class URLReader {
private static void getURL() throws IOException {
URL yahoo = new URL("http://www.yahoo.com/");
BufferedReader in = new BufferedReader(new InputStreamReader(yahoo
.openStream()));

String inputLine;

while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);

in.close();
}
public static void main(String[] args) throws Exception {
getURL();
}
}



So there is some figuring out to do. We need to import some classes, instantiate them, call methods on the object instances, etc. Here, we are trying to learn this one step at a time.

In a future post, we will learn how to put all these together and print the returned html. I say "future" post because I don't know how to do this simple task. I guess it requires a new way of thinking compared to imperative languages like Java. And I don't find this easy at all - how to loop, which functions to use in which context, etc. - thinking in terms of function chains. But we will persist ... and learn. :)

Anyway. What follows are some of the individual commands.

We first go to the clojure prompt by entering this command in a DOS window

C:\clojure-1.1.0>java -cp clojure.jar clojure.main
Clojure 1.1.0
user=>

; import a Java class, java.net.URL in this case
user=> (import java.net.URL)
java.net.URL

; create a new URL object
user=> (new URL "http://www.yahoo.com")
#

; create a new URL object and invoke its getHost() method
; In Java: new java.net.URL("http://www.yahoo.com").getHost()
user=> (. (new URL "http://www.yahoo.com") getHost)
"www.yahoo.com"

; create a new URL object and invoke its openStream() method
; In Java: new java.net.URL("http://www.yahoo.com").openStream()
user=> (. (new URL "http://www.yahoo.com") openStream)
#

; assign the InputStream associated with the URL to urlStream
user=> (def urlStream (. (new URL "http://www.yahoo.com") openStream))
#'user/urlStream
; print its value. prints object reference
user=> urlStream
#

; import multiple classes at a time
user=> (import '(java.io BufferedReader IOException InputStreamReader))
java.io.InputStreamReader

; create an InputStreamReader from the urlStream
; use an import to first import the InputStreamReader
user=> (def inputStream (new InputStreamReader urlStream))
#'user/inputStream

; create a BufferedReader from the InputStream
; use an import to first import the BufferedReader
user=> (def bufferedReader (new BufferedReader inputStream))
#'user/bufferedReader

Tuesday, March 9, 2010

Collection functions and iteration

Learned some more things today:

1. how to check if an item is in a list
2. for each item in a list / iterate over a list, do something. equivalent java code is mentioned.

; whether an item is in a set. Thanks to Christian Vest Hansen from the clojure google group!
user=> (contains? #{"3e" "2 tired" "1 more"} "3e" )
true

; Thanks to Meikel Brandmeyer of the clojure google group for the use of some, first and filter
; using some: if an item is in a list, return the first match. if not, return nil
user=> (some #{"2 tired"} (list "3e" "2 tired" "1 more"))
"2 tired"
user=> (some #{"clojure is not easy to learn"} (list "3e" "2 tired" "1 more"))
nil

; using first and filter: if an item is in a list, return the first match. if not, return nil
user=> (first (filter #(= % "3e") (list "3r" 25 "3t" "3e")))
"3e"
user=> (first (filter #(= % "34") (list "3r" 25 "3t" "3e")))
nil

; get the first item in a list
user=> (first (list "a" "b" "c"))
"a"
; get items in a list other than the first
user=> (rest (list "a" "b" "c"))
("b" "c")

; for a list with one item, rest returns an empty list
user=> (first (list "a"))
"a"
user=> (rest (list "a"))
()

; do something for each item in a list.
; note that nil is the return value of the doseq function. its not in the list!
user=> (doseq [x '("q" "r" "s")] (println x))
q
r
s
nil

equivalent to the following Java code:

String[] x = {"q", "r", "s"};
for (String s: x) {
System.out.println(s);
}
Of course, due to dynamic typing in clojure, the list could contain practically any data type.

user=> (doseq [x (list "q" "r" "s")] (println x))
q
r
s
nil

; print each item in a sequence
user=> (doseq [x (seq [1 3 4 2 3])] (println x))
1
3
4
2
3
nil

Monday, March 8, 2010

Tried some functions that operate on collections

To get to the "user=>" prompt, use the following from the DOS prompt (DOS is dead. Long live DOS!):
java -cp clojure.jar clojure.lang.Repl

; does the collection(vector in this case) contain index 4?
user=> (contains? [1 2 3 4] 4)
false

; does the vector contain index 0?
user=> (contains? [1 2 3 4] 0)
true

; could not get it to work for lists. apparently works only for numerically indexed collections like vectors.
; doesn't throw an exception either
user=> (contains? (list "3e" "2 tired" "1 more") "3e" )
false

; create a vector
user=> (vector "a" "c" 3)
["a" "c" 3]

; number of items in collection (list)
user=> (count (list 1 2 34 "er") )
4

; number of items in collection (vector)
user=> (count (vector 1 2 34 "er") )
4

; counts number of items in collection
user=> (count (list 1 2 3 4 "er" 34) )
6
user=> (count (list 23 "3er" "oel" 5) )
4

; return unique items in a list collection
user=> (set (list 1 2 3 5 6 3 2))
#{1 2 3 5 6}

; return unique items in a vector collection
user=> (set [1 2 3 5 6 3 2])
#{1 2 3 5 6}

; same a previous one. return unique items in a vector collection
user=> (set (vector 1 2 3 5 6 3 2))
#{1 2 3 5 6}

; cast a number to short
user=> (short 8383747474747448)
28728

; sort a collection
user=> (sort (vector "how" "are" "you" "?") )
("?" "are" "how" "you")