A recommendations engine written in Scala with Spark
Have a look at the wiki for Scala and Spark fundamentals
Note: This project is still Work In Progress (WIP)
This codebase is my work output as part of following Frank Kane's Course: Apache Spark 2 with Scala - Hands On with Big Data
$ git clone https://github.com/srmds/recommendation-engine-spark
Logging is done via log4j
A template for log4j properties is included in the src/main/resources/log4j.properties.template path.
- Create a custom log4j.properties file by using the template, from root of project run:
$ cp src/main/resources/log4j.properties.template src/main/resources/log4j.properties
In order to have less verbose logging and only log our own explicit log lines, change the default logging settings.
- Set the the loggin level from: INFO to ERROR:
Change the following line:
log4j.rootCategory=INFO, console
to:
log4j.rootCategory=ERROR, console
Note: the custom log4j.properties file should not be checked into version control and is therefore added to the .gitignore file.
$ ./gradlew clean build
$ ./gradlew run
$ ./gradlew clean run
- Spark - 2.1.0
rating (stars) | count (votes) |
---|---|
1 | 6110 |
2 | 11370 |
3 | 27145 |
4 | 34174 |
5 | 21201 |
See here for full analysis
Source file (100.000 rows): datasets/movielens/ml-100k/u.data
Elapsed time: 298 ms
age | average of friends |
---|---|
18 | 343 |
19 | 213 |
26 | 242 |
27 | 228 |
28 | 209 |
34 | 245 |
35 | 211 |
36 | 246 |
37 | 249 |
38 | 193 |
39 | 169 |
67 | 214 |
68 | 269 |
69 | 235 |
See here for full analysis
Source file (500 rows): datasets/friends/fakefriends.csv
Elapsed time: 173 ms
stationId | Temperature (Fahrenheit) |
---|---|
EZE00100082 | 7.700001 |
ITE00100554 | 5.3600006 |
stationId | Temperature (Fahrenheit) |
---|---|
EZE00100082 | 16.52 |
ITE00100554 | 18.5 |
See here for full analysis
Source file (1825 rows): datasets/weather/temperatures.csv
Elapsed time: 506 ms
count (occurence) | word |
---|---|
2 | refer |
3 | compared |
4 | forces |
560 | is |
616 | in |
649 | it |
747 | that |
934 | and |
970 | of |
1191 | a |
1292 | the |
1420 | your |
1828 | to |
1878 | you |
See here for full analysis
Source file (~46.249 words): datasets/book/book.txt
Elapsed time: 377 ms
amount (spent) | customerId |
---|---|
3309.3804 | 45 |
4316.3 | 47 |
4327.7305 | 77 |
4367.62 | 13 |
4836.86 | 20 |
4851.4795 | 89 |
4876.8394 | 95 |
4898.461 | 38 |
5206.3994 | 87 |
5245.0605 | 52 |
See here for full analysis
Source file (10.000 rows): datasets/spending/customer_orders.csv
Elapsed time: 267 ms
count | movieId |
---|---|
1 | 1494 |
1 | 1414 |
2 | 1585 |
2 | 907 |
2 | 1547 |
3 | 1361 |
3 | 1391 |
4 | 1223 |
4 | 1423 |
5 | 1489 |
5 | 1333 |
507 | 181 |
508 | 100 |
509 | 258 |
583 | 50 |
See here for full analysis
Source file (100.000 rows): datasets/movielens/ml-100k/u.data
Elapsed time: 219 ms
friendsCount | (id,name) |
---|---|
1933 | (859,CAPTAIN AMERICA) |
friendsCount | (id,name) |
---|---|
0 | (467,BERSERKER II) |
friendsCount | (name, id) |
---|---|
106 | RATTLER |
238 | SUPREME INTELLIGENCE |
121 | LEWIS |
84 | UNICORN/MYLOS MASARY |
966 | ICEMAN/ROBERT BOBBY |
147 | EEL II/EDWARD LAVELL |
109 | BLACK KNIGHT IV/PROF |
668 | SILVER SURFER/NORRIN |
198 | STANKOWICZ |
1014 | HERCULES [GREEK GOD] |
See here for full analysis
Source files:
- (6.589 rows): datasets/social/Marvel-graph.txt
- (19.428 rows): datasets/social/Marvel-names.txt
Elapsed time: 1515 ms
MIT License
Copyright (c) 2018 srmds
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.