-
Notifications
You must be signed in to change notification settings - Fork 0
/
sockets.notes
466 lines (374 loc) · 26.6 KB
/
sockets.notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
-----------------------------------------------------------
Sockets API in Java
-----------------------------------------------------------
Primary Goal: In order to work towards a project worth fighting for
on a daily basis, I have to start small and work my way up...
Working with sockets will lay the foundation to understanding
interprocess communication, a step to making my own messaging app.
Subgoals:
a. Learn how to use Sockets in Java (previous experience only in C)
b. Introduce myself to concurrent programming in Java (Threads)
-----------------------------------------------------------
Sockets [ REVIEW ]
-----------------------------------------------------------
- Socket: An endpoint of a two-way communication
running on some kind of network. It is bound to a port number
(the #port number# specfies the particular process/application the
connection is meant for, while the #IP address# specifies the machine)
- Client_Server_Model: Essential model of network computing today
that states the fundamental relationship between machines connected
via a network. A #server# provides resources and the #client# requests
those resources utilizing some kind of *protocol* to define how communication
sohuld be handled between both remote parties (e.g. what should the server be
expecting from the client?). Sockets in Java would be operating via the TCP/IP protocol.
[ #GENERAL FRAMEWORK# ]
1. Create a socket
2. Open an input and output stream to the socket
3. Read and write to the stream according to the server's protocol
4. Close the streams
5. Close the socket
[ #SERVER# ]
1. Create socket
2. bind() -> associates the socket to a valid address, necessary for a server to have a static IP
3. listen() -> waits for an incoming request
4. accept() -> establishes connection with request (in Java, the accept() method includes listen())
5. Read/Write
[ #CLIENT# ]
1. Create socket
2. connect() -> sends request to server at given address
3. Read/write
NOTE: The methods needed to establish a connection and the differences between "clients" and "servers"
are a bit more complicated - more details on why only the server binds to a port (i.e. why the client is auto-assigned),
why (in Java) there is no need to explicitly call connect on a listening port and address, etc.
------------------------------------------------------------
Transmission Control and Internet Protocol (TCP/IP) [ OVERVIEW ]
-----------------------------------------------------------
- Without getting into too much detail about TCP/IP Architecture and the different protocol layers at play
here (I will probably summarize my understanding of it into a diagram any way), this section is meant
to give a quick overview of both TCP and IP (and why protocols are important in general)
1. Transmission Control Protocol (Transport Layer): In general, TCP controls the transmission of data
packets - ensuring reliability of sending and the integrity of the data itself. That is, TCP ensures
data *reliably* reaches its destination in a *consistently* streamed fashion to avoid any hiccups for
more sensitive network operations (think VoIP calls, audio streaming, etc.) all in the correct *order*
(data is broken down into packets for ease of transmission, but a scrambled words in an essay you send
to your professor is as good as not doing the assignment).
2. Internet Protocol (Internet Layer): IP is charge of routing information, as well as constructing the addresses
that standardize the locations of every single internet-enabled device. Without getting overwhelmed with information
about the different types of IP addresses, it's mainly important to understand that IP addresses unique,
multi-layered locations that the protocol uses to actually get information from point A to point B.
- Many of the cooperation of these protocol layers (especially with TCP/IP that work hand-in-hand), is done by
specifying a data packet structure with additional bits of information (headers/tags) - TCP can mark the appropriate
order for each packet, IP can effectively locate the destination address and attach the source address for example.
- Overall the famous TCP/IP combo are what define the *rules and standards* for network communication. For the
purposes of this project, it gives insight into the magic behind the ability to send information from machine to
another anywhere else in the world at the touch of a button.
------------------------------------------------------------
#Java I/O# (Simplified)
------------------------------------------------------------
1. #Byte Stream# - Low level I/O stream that can manipulate 8-bit bytes (often avoided)
2. #Character Stream# - Works the same as byte streams with automatic translation to local character sets (internationalization)
3. #Buffered Streams# - Used to minimize the overhead needed for OS-level operations for reading/writing input; accesses, holds and flushes in chunks (buffer)
4. #Scanning + Formatting# - In order to work with information that is actually readable we use *Scanner* (breaks down input into tokens) and *PrintWriter* (formats to more-readable forms)
5. #Standard I/O# - Like C, Java has 3 main fronts for I/O: Standard Input (`System.in`), Standard Output (`System.out`) and Standard Error (`System.err`) - all byte streams
6. #Console# - Character stream and more-feature-rich version of standard I/O and especially useful for secure password entry
-----------------------------------------------------------
!PROJECT OUTLINE!
-----------------------------------------------------------
. Create a basic Java version of netcat - arbitrary client/server communication (2 running programs needed)
. NOTE: In order to get around making them two separate Java projects, we can use a command line argument to differentiate client and server functionality
. With netcat it seems like both people messaging are simultaneously acting as servers and clients (you send + receive messages), but really
the person who starts the program first would be the initial 'server' (if there doesn't already exist a listening connection), and the next person
would simply connect to the address.
. It would also be cool for each person to have an identity + profile (can be wrapped in its own class)
. NOTE: In order to use the shell script, the view settings of sublime have to be changed to Unix
[ #DESIGN CHOICES# ]
. Chose to catch a number of exceptions (like the IOExceptions that crowded the MySocketServer
constructor and methods) due to the common appearance of the IOException and the different possible
places they could be coming from - it made more sense to fail fast instead of propogating values
of a deeper error like failure to bind/instantiate the server socket
. While it really doesn't matter which is teachnically the "client" and which is the "server",
the distinctions for this project were made in order to account for new use cases down the road
(e.g. chat bot that multiple clients can connect to, server that relays messages to other clients, etc.)
. Protocol: 55 byte content length based on a few estimates:
a. Average word length: 5.1 letters (5.1 bytes)
b. Average text length: 7 words (7 bytes)
c. Number of spaces: 7 bytes
} *Overhead*
d. Message delimiter: <<END>> = 7 bytes
. function: a * b + c + d = (5.1)(7) + 7 + 7 ~ 50 bytes
[java]
import java.net.Socket;
public class SocketServer{
// 'Listener' functinality
ServerSocket server = new ServerSocket();
SocketAddress address = new InetSocketAddress("hostname", 8000);
server.bind(address);
// while loop logic?
server.accept();
}
public class SocketClient{
// 'Client' functinality
Socket client = new Socket();
SocketAddress address = new InetSocketAddress("hostname", 8000);
// loop logic?
client.connect(address);
}
[\java]
------------------------------------------------------------
*QUESTIONS*
-----------------------------------------------------------
#What port number should I be using when testing locally?
- Pretty straight forward answer for the most part. Given the
outline for the Internet Protocol suite, designating port numbers
for specific purposes (outlined in the #RFC 1700#). Thus a quick rule
of thumb as to not disrupt the designated, specialized ports is
to use any port >1024 (0-1024 are resreved by the protocol) and
<65535 (largest unassigned port number)
#Why is the exception handling framework in Java useful?
- Why Use it?
1. Separates error-handling code w/ functionality, so as to
increase code readbability
2. Propagate errors up the call stack; ultimately giving the
programmer the agency to specify error handling flow (such as
only catching the exception for relevant methods)
3. Distinct + organized hierarchy > generic errors -- the more
specific the information about what went wrong, the easier the fix
- Best Practices
1. Use finally block to avoid resource leaks (InputStream, Scanner, etc.)
2. Specific Exceptions > General Exceptions (same as above)
3. Effectively document exceptions w/ `@throws` and descriptive messages
4. Catch the most specific exception first
#What does it mean for a thread to be interrupted - e.g. Interrupted Exceptions
. An *interrupt* is an indication to a thread that it should stop what it
is doing and do something else. It's up to the programmer to decide exactly
how a thread responds to an interrupt, but it is very common for the
thread to terminate.
- More on this concept later...
#Why does the accept() method return a socket?
. The initial client and server sockets arent the only endpoints used for
connections in a sense because accept actaully ends up returning a socket
of its own once a connection is established
. According to Beej's *Socket Programming Guide in C*: "It'll return to you "
"a brand new socket file descriptor to use for this single connection..."
"The original one is still listening for more new connections, and the "
"newly created one is finally ready to send() and recv()."
#Why does the port number change after accept()?
. While listening, our server is on a static port - statically waiting for
connections at the specific address + port
. Once accepted, not only is a new socket returned, but because every connection
needs a unique file descriptor, once a connection with a client is established,
the newly returned application-level socket is chosen at random (for the most part)
. An interesting analogy for this would be to think of the listen()-ing server and
connect()-ing client as agreeing upon a meeting point (say port 8080) and migrating
else where after they've met up. Now say multiple connections are trying to meet, in
order for the server to handle the multiple different requests from the incoming
clients, spearating each to their own "channel" makes it easier to compartmentalize
resource requests for the server to handle.
#Don't println to a Socket - JavaRanch Technote 1157
. After constant debugging of BufferedReader and never getting the application to read
a line I stumbled upon this article that help me understand common causes of
client-server deadlocks in Java
. This article primarily refers to the different line-breaking conventions and the
inherent dangers of using *println* with sockets, since underlying operating systems
expect LF, CRLF, CR...etc.
`
#What is a thread, and why do I need them
. In order to fully realize the version of netcat that I am imagining, I need to get
around all of these synchronous, blocking calls - I have to dive deep into learn threading...
#What's the most effective way of planning a software/engineering project
. With the organization of a major side project always comes the issue of organization and
scope - even though I'm starting this for myself, this project isn't just for me and I need
my code, project organization and scope to reflect that sentiment. Unfortauntely, effectively
scoping a project in this sense is not something taught (really at all). I've decided to take
this on in response to a very obvious sense of *feature creep*; it's better if I specifically
re-define my goals and simply keep iterating.
1. Planning: Sketch out ideas, functionality and understand each moving part of the system
a. define very *specific* goals; categorize features/capabilities into must-haves
and nice-to-haves if that helps.
b. Set up clear milestones for the project
c. Modular code, compartmentalized functionality, think in sub-systems
(easier to test + work towards incrementally)
2. Scoping:
a. Break the entire project down into subtasks
b. "Calendar-ize" - make a timeline that includes completing a few tasks at a time
c. Account for the fact that not every task can be foreseen + incroporated into the plan
d. Timebox open-ended parts of the project (especially for things like optimization)
3. Execution and Re-scoping:
a. Calibrate goals + tasks according to expected and actual progress
b. [ REPEAT ]
#Protocol Design
. In order to successfully communicate between client and server, both parties must "agree to a language"
in a sense. While I intially thought the protocol was preset with Java Sockets, I *misunderstood*
the need for protocols that manage the various moving parts (just because TCP is utilized in Java
sockets doesn't mean we don't need to define ****how to**** read or write from our sockets).
. Given some research regarding pros/cons of existing protocols, here are a few key features to think
about (primarily for future iterations, improvements, but also for curiosity)
1. Necessary rountrips (latency)
- A client request -> server response is often referred to as a *roundtrip*. Depending on the
complexity of the communication or request, the protocol may require more than one (unlikely for messaging)
2. End-of-Request Markers
- Put simply, the recipient of a message needs to know how long the message is expected to be in
some shape or form - remember data isn't always sent as a single collection of bytes, but as a
series of data packets that are like individual puzzle pieces to the whole message
- This gives some breathing room for transmission optimzations (perhaps?)
#Best Practices for Exception Handling (continued)
1. Using printStackTrace() should actually be avoided - often better to use a logger framework. This is due
to a number of flexibity advantages gained from a framework:
a. You can log messages go to a configurable location.
b. The end user doesn't see the messages unless you configure the logging so that he/she does.
c. You can use different loggers and *logging levels*, etc to control how much little or much logging is recorded.
d. You can use different appender *formats* to control what the logging looks like.
e. You can easily plug the logging output into a larger monitoring / logging framework.
f. All of the above can be done without changing your code; and rather by editing the deployed application's logging config file.
2. Through some additional quick research, it looks like `Log4j` has the upper hand in terms of performance.
3. In general, debugging only gets more difficult to understand (especially in a multi-threaded environment with a high-level
language like Java) the more complex the application being written is. Utilizing an inherently hierarchical and
customizeable framework that doesn't "swallow" exceptions makes our lives easier.
------------------------------------------------------------
#Threads & Concurrent Java
-----------------------------------------------------------
[ #Processes and Threads# ]
. As it's probably obvious this day and age, concurrent programming, multi-core processors (with
multi-threading within each of those cores) has become the pretty common and incredibly powerful)
are often seen as synonymous with entire applications or programs, and to a certain
extent this notion isn't totally off, however that are a number of applications that actually
incroporate interprocess communcation (IPC) nto only within the same operating system, but even
across different systems.
. NOTE: Processes are are particular instances of a program thathave a
#self-contained execution environment# - that is, a process has, in and of itself, all
the components it needs for execution. Processes have their own memory space, address space, executable
code, and unique process identifier.
. Furthermore, processes are known to be more for "heavyweight" tasks within an operating system, so
processes tend to have more overhead (given it requires dedicated memory/address space, etc.)
. Threads are often referred to as lightweight process, but they aren't as related than at first glance.
They are the smallest sequence of programmed instructions managed by the scheduler of an operating system.
Each process has at least one thread, but what they are from the perspective of the operating system
is very distinct:
. A *process* is more so related to data organization (setting aside memory and address
space; an instance of execution instructions for an entirely new program); A *thread* is a more like
a slice of process and is what the CPU actually runs - sharing resources with the process itself as well
as with other threads (if there are any).
[ #Thread Objects# ]
. In order to take advantage of this concurrency in Java, there are 2 different approaches to using
Thread objects:
1) Using the Runnable interface (Interface)
2) Making the class you want to run asynchronously a *sublclass* of Thread (Inheritance)
As with most inheritance/interface debates, interfaces often allow for more flexibility - and in the
case of threads, are more applicable to some of the higher-level thread management (as opposed
to simple asynchronous tasks/instantiations directly). Here's some sample code:
[java]
public class HelloRunnable implements Runnable {
public void run() {
System.out.println("Hello from a thread!");
}
public static void main(String args[]) {
(new Thread(new HelloRunnable())).start();
}
}
[java]
[ #Important Thread Methods# ]
1) sleep(): Causes the specified thread to explicitly stop execution for a specified period
of time. In more detail, the sleep method interacts with the Operating System's scheduler
to put the current thread in a wait state (then back to runnable) for the CPU to continue.
This means that sleep is !DEPENDENT! on the underlying OS scheduler and may not take more/less time than specified.
2) start(): Starts a thread's execution. For a Thread object, start() actually creates the thread
so that the JVM can call the Runnable object's run() method in parallel. **run is called on a new thread**
3) run(): For a Runnable object, this is the custom method that defines what to do when the object
is executed (run is called on it). By default (that is with no attached Runnable object other than Thread itself)
this method does nothing. **run is called on the current thread**
4) setPriority(): Sets the current thread's priority (ranges between MIN_PRIORITY and MAX_PRIORITY) - more on
*Priority Scheduling* in Java below.
5) interrupt(): An *interrupt* is an indication to a thread that it should stop what its doing and do something else.
Especially when taking into account the context of controlling execution flow that Threads are used for, the use
of interrupts make sense.
a. Numerous methods throw InterruptedException, such that Threads support their own interruptions in
that sense - while the thread is doing whatever it's doing, the exception can be caught.
b. For operations that don't invoke methods with InterruptedException (say the thread is working for a while
or won't run any vaible methods for the remainder of its execution), we have to ensure that we manually check
using the available Thread methods (below)
6) isInterrupted(): Checks if the Thread instance is *currently interrupted*
7) interrupted(): (static) Checks if the current thread *has been interrupted since last time queried* and
subsequently *clears the interruption flag* (such that if called two times in a row, the second call would
be false *even if the first was true*)
8) join(): This method allows for one thread to wait for the completion of another - such that calling
join on a Thread object (let's call it t) causes the current thread to pause until t is finished executing.
[ #Priority Scheduling# ]
- In order to understand scheduling on a deeper level, I've tried to run through some concepts that both build
on top of each other and give insight to adjacent topics.
[ *Fixed Priority Preemptive Scheduling* ]
. A scheduling system handled by the operating system, typically used in *real time systems*, ensuring that
at any given time, the processor *executes the highest priority task* of all tasks ready to be executed.
Priority Scheduling tends to be a common use case for *priority queues* (relating back to past data strucuture coursework)
. `Fixed` in the sense of being set a specific priority rank for every process and subsequently being placed on the queue.
. `Preemptive` in the sense of the process'/thread's ability to get interrupted through no cooperation of its own
[ *Real Time Systems* ]
. The term "real-time" mostly depends on the context to which the system is being used, characterized by specified time "deadlines".
In most cases, this is on the order of milliseconds.
. A real-time software application is thus categorized by the *guarantee of returning a processed result within the specified time*
*frame regardless of system load* - a critical guarantee for time-sensitive operations (anti-lock brakes on a vehicle for ex.)
[ *Preemption* ]
. Preemption is the act of temporarily interrupting a task being carried out, through no cooperation of its own,
with the intention of resuming the task at a later time.
. The detail that *preemption does not involve the process itself* is an important one: preemption tends
to refer to a sort overriding of process/thread functionality - normally carried out by a *privileged* task
(*Protection Rings*) or a preemptive scheduler (*Time Slices*)
. NOTE: The essential feature of storing the current process state and loading another's is more closely related to
the term *context switching*
[ *Protection Rings* ]
. Also called hierarchical proteciton domains, refer to the mechanism of giving certain operations some
inherent priority (or `privilege`) - typically more strongly to those operations closer to hardware and operating
system kernel operations.
[ *Time Slice* ]
. Defined as the period of time for which a process is allowed to run in a preemptive multitasking system.
That is, for preemptive multitasking systems, the aim is to give each process a regular "slice" of operating time,
allowing for more efficient multitasking that can rapidly deal with incoming data, current process, etc.
. Thus in this context, time slices defined by the preemptive scheduler have the power to "preempt" a current process'
operations (unsolicited interrupt essentially) for more efficient use elsewhere.
[ *Resource Starvation* ]
. Starvation refers to the common problem in concurrent computing where a process is perpetually denied the resources
needed for its work. This situation is either caused by errors in scheduling, overly-simplistic scheduling algorithms,
or resource leaks.
. NOTE: Taking advantage of potential resource starvation is what a number of viruses actually take advantage of (ex. DoS)
and the impossibility of starvation known as "starvation-freedom" or "lockout-freedom", is often quite necessary.
------------------------------------------------------------
(Sub)Tasks
-----------------------------------------------------------
[ OPEN-ENDED/OPTIMIZATION TASKS ]
[x] Look into incorporating threads/concurrency into the project -> asynchronous send() + recv()
[x] Research error handling best practices (stderr -> text file or?)
[`1.5 hr`] Optimize I/O streams used - to buffer or not to buffer? input - yes, output - no? buffered stream + use of buffer redundant?
[`2 hrs` ] Learn how to use Gradle + take notes: "http://www.vogella.com/tutorials/Gradle/article.html"
[`30 min`] Look into Log4j best/useful configurations - briefly understand why logging is better with a setting
[`2 hrs`] (AFTER LIBRARIES ADDED) Really understand Maven/Gradle -> Seems like Gradle is the most modern + feature-rich
[`1 hr`] Optimize (for best practice) access modifiers used -> review the larger implications of good encapsulation
[ QUICK TASKS ]
[x] Specify command line argument instructions + error handling (server)
[x] Specify command line argument instructions + error handling (client)
[x] Define NetCat Protocol
[x] Fix extra send() and recv() exception after using "exit" termination
[x] Add short documentation for each methods (will help clarify for both other devs & myself)
[ ] Notify client/server on whether or not the other "person" exited/joined -> maybe prompt for relisten?
[ ] Incorporate Log4j library (print errors use logger NOT std out or std errors)
[ LARGER TASKS ]
[x] Incorporate Commons CLI -> Merge server/client functionality = NetCat.java
[x] Create CLI wrapper object: parses String arr, returns hostname + port = NetCatArgs
[x] Merge NetCat* classes
[x] Debug option/command line objects
[x] Set "getOptionValue" default to localhost + 8080 -> change args to be optional
[x] Clean up code - remove redundant/unecessary methods (NetCat overloaded constructor, requiredArgs())
[x] Implement (async) continuous I/O socket streams: client stdin -> server (read) -> server stdout (+ vice versa)
[ ] Implement small test suite (benchmark speeds, edge case testing, multi-thread heavy traffic, etc.)
[ ] Implement/edit shell script to compile, find the ip address to broadcast (and maybe send as an invitation??)
-----------------------------------------------------------
#Potential Improvements
-----------------------------------------------------------
. Adding *communicator* object that includes some personal information;
can be displayed upon connection, or can be queried by another user
. Multiple client support (Introduction to *threading* perhaps?)
. In order to incorporate some kind of loading screenm I would need to make a
separate thread while the accept() call is *blocking* the initial thread
. Change the port number of my sockets to be inputs to start() and not an
instance variable? Probably not a big difference (instantiate the server
with a port instead of using it for start), so it would depend more so
on potential use cases
. *Multi client support* - User IDs -> very small group chats -> Contact List (linked to IP addresses)