-
Notifications
You must be signed in to change notification settings - Fork 0
/
part2-nstar.tex
2179 lines (1726 loc) · 103 KB
/
part2-nstar.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\begingroup
\usingnamespace{nstar:grammar}
\usingnamespace{nstar:rules}
\part{The back end}\label{part:nstar}
\chapter{Introduction}\label{chap:nstar-abstract}
Assembly languages are the lowest level of humanly-possible programming there exists nowadays. They were used back in the days for very performance-critical tasks, or even just for fun, as there weren't many other programming languages available. Nowadays, with all the existing ones, most people have never used any assembly language.
Despite their apparent simplicity, and the small amount of work you need to put into creating assembly languages, those are in fact very hard to use. At that low level, there is no such thing as Java's exceptions keeping you from doing dumb things, Rust's linear types keeping you from leaking memory nor even Garbage Collectors. The nice things preventing you from having segfaults simply do not exist, and you are expected to either provide all those things yourself (creating a language runtime) or be very careful about what you are doing each time you write a single instruction (even a simple \texttt{mov} can have undesirable side-effects).
Doing dumb things is something that we must prevent directly when using the language. That way, we do not need to rely on external verification tools or debuggers, trying to know why a program segfaults at a specific point.
This is where a type system can become handy. Typed assembly languages are assembly languages augmented with simple yet powerful type system. Among the most famous typed assembly languages are TALx86~\cite{TALx86}, FunTAL~\cite{FunTAL} and DTAL~\cite{DTAL}.
TALx86 is basically NASM with a type system, targetting only the x86 architecture. DTAL is much more complicated and embeds a completely dependent type system. FunTAL is a more complex system than TALx86 (it is based on STAL~\cite{STAL}) and provides a fully-fledged type assembly language, almost entirely integrated inside a functional programming language.
\vspace{\baselineskip}
\nstar's goal is to assist users with a simple but powerful type system, yet still allowing for low-level data manipulation.
But before even being a usable programming language, \nstar\ aims at being a compiler backend (much like for example LLVM), and is used that way in the Zilch project. Differences with other compiler backends are mostly the type-system, allowing the compilation of Zilch source code into type-safe instructions.
\vspace{\baselineskip}
Because \nstar\ supports compiling to multiple architectures, using different grammars, describing \nstar\ will at first be platform-agnostic, treating common aspects between all CPU architectures, and then will be divided into multiple categories, explaining in more details some features on a per-architecture basis\footnote{Note that the target executable format (ELF, PE, \ldots) is also considered as an architecture-specific thing, but should not influence \nstar\ much.}.
\section*{Small notes on the notation used in the next chapters}\label{sec:nstar-abstract-notation}
\begin{itemize}
\item Kind inference
\begin{itemize}
\item $\Gamma$ is the typing context, containing every bindings from names to type variables;
\item $\vdash^{\text{\tiny K}}$ is the kind inference judgement;
\end{itemize}
\item Type inference
\begin{itemize}
\item $\Xi$ is the context containing all bound labels, composed of two other contexts: $\Xi_{c}$ and $\Xi_{d}$. $\Xi_{c}$ contains all the labels defined in \texttt{code} sections, and $\Xi_{d}$ contains all the labels defined in \texttt{data} sections;
\item $\chi$ is the current register-to-type mapping, and contains all known-to-be-bound registers.
A register which is not present in this context means that either it has never been bound, or the value in it has been forgotten at some point;
\item $\sigma$ is the current stack type;
\item $\epsilon$ is the current continuation;
\item $\vdash^{\text{\tiny T}}_{\text{\tiny data}}$ and $\vdash^{\text{\tiny T}}_{\text{\tiny code}}$ are the type inference judgements for expressions in their respective sections (\texttt{data} or \texttt{code}).
$\vdash^{\text{\tiny T}}_{\text{\tiny data}}$ uses no context, whereas $\vdash^{\text{\tiny T}}_{\text{\tiny code}}$ uses all four of the defined contexts above (namely $\Xi$, $\chi$, $\sigma$ and $\epsilon$);
\item $\ISequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{instr}{\chi^\prime}{\sigma^\prime}{\epsilon^\prime}$ denotes that the type inference of the instruction $instr$ in the current typing contexts $\Xi;\Gamma;\chi;\sigma;\epsilon$ yields a new typing context $\chi^\prime;\sigma^\prime;\epsilon^\prime$ (where $\Gamma$ and $\Xi$ cannot be altered);
\item $t_1 \sim t_2$ (and its counterpart $t_1 \nsim t_2$) respectively mean that $t_1$ and $t_2$ can be unified (or not);
\end{itemize}
\item Grammar
\begin{figure}[H]
\centering
\scalebox{.5}{
\includegraphics{grammar-template}
}
\end{figure}
\begin{itemize}
\item Normal rectangles describe other grammar rules (whose name is given in the shape);
\item Rounded rectangles describe terminal nodes, that is invariant string pieces;
\item The name of the grammar rule is given in the top-left corner of the diagram
For a rule to match, it must be possible to go from one end to the other while staying on the black rails;
\end{itemize}
\end{itemize}
\chapter{Non platform-specific features}\label{chap:nstar-common}
\section{Types}\label{sec:nstar-common-ts}
One of the differences between classical assembly languages and \nstar\ is its type system.
Compared to other higher level programming languages like Java, C++, etc, \nstar\ has a very simple yet powerful and expressive enough type system.
In programming, types are used mostly to prove at compile-time that a given program should behave well if it type-checks. While this works for more elaborated programming languages like Haskell, Idris, etc, most type systems aren't expressive enough to absolutely guarantee that everything will work at run-time (in fact, there is no possible way of doing this, because for example a memory allocation may fail, and this cannot be checked at compile-time). However, we can try to guarantee as much as possible.
\nstar\ doesn't try to solve this issue, because it would be really hard to target a dependently typed assembly language from a non-dependently typed programming language. But where all used assembly languages do not even consider types (only numbers, in fact), \nstar\ embeds a powerful type system used to remove the possibility of bugs (like incorrect structure addresses passed as a parameter function, or incoherent types in some instructions).
\subsection{Kinds}\label{subsec:nstar-common-ts-kinds}
Kinds (also known as types of types) mainly serve the purpose of indicating type sizes.
There are three type of kinds in \nstar:
\begin{itemize}
\item Stack kind, denotating stack-like types, which can be safely offsetted
\item Kinds whose size is abstracted away, useful to ask for any sized type
\item Kinds whose size is already known
\item Continuation kinds, for types that can be used as continuations
\end{itemize}
The grammar is given in Figure~\ref{fig:nstar-common-ts-kinds-syntax}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/types/kinds-syntax}
}
}
\caption{Grammar for kinds.}
\label{fig:nstar-common-ts-kinds-syntax}
\end{figure}
\subsection{Integer types}\label{subsec:nstar-common-ts-integer}
Numbers are the building block of any assembly language. Most of data manipulated is manipulated as numbers, e.g.\ addresses, characters, strings, enumerations, etc.
This is not the case in \nstar, where ``integer'' only really means ``number''.
The syntax for the types of integers is given in Figure~\ref{fig:nstar-common-ts-integer-syntax}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/types/integers-syntax}
}
}
\caption{Grammar for integer types.}
\label{fig:nstar-common-ts-integer-syntax}
\end{figure}
Integers have two varying parameters: their sign and sizes.
According to the sign (i.e.\ signed or unsigned), some operations may not perform the same (for example \texttt{mul} does not behave the same).
The size is nothing more than the number of bits occupied by the integer (in \nstar, those are restricted to some powers of 2 between $8$ and $64$ included).
Most operations should perform the same no matter the integer size, however it is recommended to search in the target architecture manual for further reference.
Kinds of integers are written under the form of inference rules in Figure~\ref{fig:nstar-common-ts-integer-kindrules}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tunsigned{64}}{\T8}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tsigned{64}}{\T8}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tunsigned{32}}{\T4}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tsigned{32}}{\T4}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tunsigned{16}}{\T2}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tsigned{16}}{\T2}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tunsigned{8}}{\T1}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tsigned{8}}{\T1}}
\end{prooftree}
}}
\caption{Kind inference rules for integers.}
\label{fig:nstar-common-ts-integer-kindrules}
\end{figure}
\subsection{Other atomic types}\label{subsec:nstar-common-ts-otheratomic}
There are two other atomic types that we did not talk about, but refered to in the introduction of the type system: characters and pointers. Their respective syntax is given in Figure~\ref{fig:nstar-common-ts-atomic-syntax}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.5}{
\includegraphics{nstar/types/ptr-syntax}
} \\
\scalebox{.5}{
\includegraphics{nstar/types/char-syntax}
}
}}
\caption{Grammar for character and pointer types}
\label{fig:nstar-common-ts-atomic-syntax}
\end{figure}
In all assembly languages, characters are merely syntactic sugar for their ASCII code. This is how their are put in the machine code anyway, so it is not a huge problem (it might even not be at all).
Pointers, on the other hand, are unabstracted memory addresses.
In \nstar, there are two types of pointers: data pointers and stack pointers.
Stack pointers are covered in Subsection~\ref{subsec:nstar-common-ts-stack}~``\nameref{subsec:nstar-common-ts-stack}''.
Data pointers simply represent an address where we know (or not, see the Subsection~\ref{subsec:nstar-common-unsafe-derefliteraladdr}~``\nameref{subsec:nstar-common-unsafe-derefliteraladdr}'') that there is a value of the given pointed type.
Kind inference is given in Figure~\ref{fig:nstar-common-ts-atomic-kindrules}.
They support two common operations: offsetting (see the Subsection~\ref{subsec:nstar-common-unsafe-ptroffset}~``\nameref{subsec:nstar-common-unsafe-ptroffset}'') and dereferencing (taking the value pointed by the pointer).
Dereferencing is considered a safe operation, unless trying to on a literal address.
There is no notion of ``null'' pointers, like \texttt{NULL} in C.
However, it is possible to use the literal \texttt{\$0} (which represents a pointer to the address $0$).
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tchar}{\T1}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\hypo{\KindSequent{\Gamma}{t}{\Ta}}
\infer1[64-bits pointers]{\KindSequent{\Gamma}{\Tptr{t}}{\T8}}
\end{prooftree}
\hspace{3em}
\begin{prooftree}
\hypo{\KindSequent{\Gamma}{t}{\Ta}}
\infer1[32-bits pointers]{\KindSequent{\Gamma}{\Tptr{t}}{\T4}}
\end{prooftree}
}
\caption{Kind inference rules for other atomic types.}
\label{fig:nstar-common-ts-atomic-kindrules}
\end{figure}
\subsection{The effectful type}\label{subsec:nstar-common-ts-effect}
The effectful type, also known as ``bang type'', is a type used to denote that some computation \textit{may} completely overwrite a value somewhere, potentially changing its type.
Its use is mainly in context types, to indicate that a function may change a register during its computation, therefore that it must be caller-saved.
Grammar and kind inference rules are given in Figure~\ref{fig:nstar-common-ts-effect-syntax} and Figure~\ref{fig:nstar-common-ts-effect-kindrules}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.5}{
\includegraphics{nstar/types/bang-syntax}
}
}}
\caption{Grammar for bang types.}
\label{fig:nstar-common-ts-effect-syntax}
\end{figure}
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\infer0{\KindSequent{\Gamma}{\Tbang}{k}}
\end{prooftree}
}
\caption{Type inference rules for bang types.}
\label{fig:nstar-common-ts-effect-kindrules}
\end{figure}
\subsection{Context types}\label{subsec:nstar-common-ts-records}
Record types (or contexts) are mappings from registers to types, which also hold information about the current stack, as well as the return continuation.
They are used to indicate that a register is bound to a value of a given type at a certain point in the program.
The grammar is described in Figure~\ref{fig:nstar-common-ts-records-syntax}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.5}{
\includegraphics{nstar/types/records-syntax}
}
\\
\scalebox{.5}{
\includegraphics{nstar/types/continuation-syntax}
}
}}
\caption{Grammar for context and continuation types.}
\label{fig:nstar-common-ts-records-syntax}
\end{figure}
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\hypo{\KindSequent{\Gamma}{\hat{\chi}}{}}
\hypo{\KindSequent{\Gamma}{s}{\Ts}}
\hypo{\KindSequent{\Gamma}{e}{\Tc}}
\infer3{\KindSequent{\Gamma}{\Tcontext{\hat{\chi}}{s}{e}}{\Ta}}
\end{prooftree}
}
\caption{Type inference rules for record types.}
\label{fig:nstar-common-ts-records-typerules}
\end{figure}
Context types are used to represent data contexts where any mapping is some sort of a proof that some data of some type is accessible through some register.
\vspace{\baselineskip}
\textbf{Note:} In the context of label types (and more generally code addresses), a context type is augmented by a \texttt{forall} generic type variable binder.
The grammar is described in Figure~\ref{fig:nstar-common-ts-label-types-syntax}.
The type variable binder is used to abstract away some details of the type through an opaque variable specialized at the call site.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/types/label-syntax}
}
}
\caption{Grammar for label context types.}
\label{fig:nstar-common-ts-label-types-syntax}
\end{figure}
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\hypo{\KindSequent{\Gamma,\hat{v}}{\zeta}{\Ta}}
\infer1{\KindSequent{\Gamma}{\Tforall{\hat{v}}{\zeta}}{\T8}}
\end{prooftree}
}
\caption{Type inference rules for label context types.}
\label{fig:nstar-common-ts-label-types-typerules}
\end{figure}
The example code given in Listing~\ref{lst:nstar-common-ts-records-stackmask} shows a use for type variables.
The stack is abstracted away, meaning that we can call this function from anywhere, given any stack (as long as it is \texttt{call}ed, or \texttt{jmp}ed to from a \texttt{call}ed function).
The type variable \texttt{s} is specialized at the call site.
\begin{listing}[htb]
\centering
\inputminted{\nstarlexer}{examples/stack-masking.nst}
\caption{Stack masking using a type variable binder.}
\label{lst:nstar-common-ts-records-stackmask}
\end{listing}
\subsection{Stack types}\label{subsec:nstar-common-ts-stack}
There is one stack type in \nstar: the stack constructor \texttt{::}.
Note that there is no ``empty stack'' type as would be the case with e.g.\ lists in Haskell.
The reason is that it forces the developer to abstract the stack to be able to have a ``stack tail'' (the part of the stack on the right of the stack constructor \texttt{::}) at some point.\footnote{It also serves the purpose to ensure that we cannot construct a stack from nothing, and that it should always be given to us, to e.g.\ the \texttt{main} function.}
An example is given in Listing~\ref{lst:nstar-common-ts-records-stackmask}.
The grammar for stack types is given in Figure~\ref{fig:nstar-common-ts-stack-types-syntax}.
Inference rules for kinds of stack types are written in Figure~\ref{fig:nstar-common-ts-stack-types-kindrules}
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/types/stack-cons-syntax}
}
}
\caption{Grammar for stack types.}
\label{fig:nstar-common-ts-stack-types-syntax}
\end{figure}
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\hypo{n\in\mathbb{N}}
\hypo{\KindSequent{\Gamma}{t}{\T{n}}}
\hypo{\KindSequent{\Gamma}{s}{\Ts}}
\infer3{\KindSequent{\Gamma}{\Tstack{t}{s}}{\Ts}}
\end{prooftree}
}
\caption{Kind inference rules for stack types}
\label{fig:nstar-common-ts-stack-types-kindrules}
\end{figure}
\subsection{Structure types}\label{subsec:nstar-common-ts-structs}
Structure types are packed sets of unnamed types (compared to context types, each field does not have a name, only a type) that can be indexed from a pointer only to a full field (so if you have two \texttt{u64}s, you can only offset to $0$ and $8$, respectively for the first and second field).
Grammar is described in Figure~\ref{fig:nstar-common-ts-structs-syntax} and kind inference rules are given in Figure~\ref{fig:nstar-common-ts-structs-kindrules}.
Structures take as much space as all their fields, thus can mainly be put on the stack (unless being less than 8 bytes, which is the upper limit for register sizes in current architectures).
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/types/struct-syntax}
}
}
\caption{Grammar for structure types.}
\label{fig:nstar-common-ts-structs-syntax}
\end{figure}
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\hypo{n_0, n_1, \ldots, n_p \in \mathbb{N}}
\hypo{\KindSequent{\Gamma}{t_i}{\T{n$_i$}}}
\hypo{m = \sum_{i = 0}^{p}{n_i}}
\infer3{\KindSequent{\Gamma}{\Tstruct{t_0}{t_1}{\ldots}{t_p}}{\T{m}}}
\end{prooftree}
}
\caption{Kind inference rule for structures.}
\label{fig:nstar-common-ts-structs-kindrules}
\end{figure}
\subsection{Union types}\label{subsec:nstar-common-ts-unions}
Union types simply are overlapping data bits, which can be given meaning depending on what type you decide to access.
Grammar for union types is given in Figure~\ref{fig:nstar-common-ts-unions-syntax}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/types/union-syntax}
}
}
\caption{Grammar for union types.}
\label{fig:nstar-common-ts-unions-syntax}
\end{figure}
Unions are sized depending on the types they unite: their sizes will be the maximum of all the sizes of all the components.
This is more clear in the kind inference rule given in Figure~\ref{fig:nstar-common-ts-unions-kindrules}.
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\hypo{n_0, n_1, \ldots, n_p \in \mathbb{N}}
\hypo{\KindSequent{\Gamma}{t_1}{\T{n$_i$}}}
\hypo{m = \max_{i = 0}^{p}{n_i}}
\infer3{\KindSequent{\Gamma}{\Tunion{t_0}{t_1}{\ldots}{t_p}}{\T{m}}}
\end{prooftree}
}
\caption{Kind inference rules for union types.}
\label{fig:nstar-common-ts-unions-kindrules}
\end{figure}
Note that unlike structure types, you will most likely be storing union types in registers, unless working with very big unions.
Also, accessing a union's field is considered an unsafe operation, but this will be more detailed in another section on expressions.
\section{Expressions}\label{sec:nstar-common-expressions}
There are four types of expressions in \nstar: immediate values, data labels, registers and pointer offsets.
All of these can be used as the source operand of the \texttt{mov} instruction.
\subsection{Immediate values}\label{subsec:nstar-common-expressions-immediate}
Immediate values are also known as ``hardcoded values''.
These are values that you can directly read in the source file, and which do not come from e.g.\ registers.
Grammar and inference rules are given respectively in Figure~\ref{fig:nstar-common-expressions-immediate-grammar} and Figure~\ref{fig:nstar-common-expressions-immediate-typerules}.
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.5}{
\includegraphics{nstar/exprs/imm-char-grammar}
}
\\
\scalebox{.5}{
\includegraphics{nstar/exprs/imm-int-grammar}
}
}}
\caption{Grammar for immediate values in \nstar.}
\label{fig:nstar-common-expressions-immediate-grammar}
\end{figure}
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\begin{prooftree}
\hypo{c$ is a character immediate$}
\infer1{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{c}{\Tchar}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{i$ is an unsigned integer immediate$}
\hypo{t \in\{ \Tunsigned{8}, \Tunsigned{16}, \Tunsigned{32}, \Tunsigned{64} \}}
\infer2{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{i}{t}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{i$ is a signed integer immediate$}
\hypo{t \in\{ \Tsigned{8}, \Tsigned{16}, \Tsigned{32}, \Tsigned{64} \}}
\infer2{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{i}{t}}
\end{prooftree}
}}
\caption{Type inference rules for immediate values in \nstar.}
\label{fig:nstar-common-expressions-immediate-typerules}
\end{figure}
\subsection{Labels}\label{subsec:nstar-common-expressions-labels}
\subsubsection{Data labels}\label{subsubsec:nstar-common-expressions-labels-data}
Data labels are pointers to some piece of data (may it even be an array).
They can be dereferenced and offset using the constructs given in \nstar.
Grammar and inference rules are given respectively in Figure~\ref{fig:nstar-common-expressions-labels-data-grammar} and Figure~\ref{fig:nstar-common-expressions-labels-data-typerules}.
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/exprs/label-expr-grammar}
}
}
\caption{Grammar for data labels in \nstar.}
\label{fig:nstar-common-expressions-labels-data-grammar}
\end{figure}
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\hypo{l : t \in\Xi_d}
\infer1{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{l}{\Tptr{t}}}
\end{prooftree}
}
\caption{Type inference rules for data labels in \nstar.}
\label{fig:nstar-common-expressions-labels-data-typerules}
\end{figure}
\subsubsection{Code labels}\label{subsubsec:nstar-common-expressions-labels-code}
Code labels are pointers to some piece of executable instructions.
Unlike data labels, they cannot be dereferenced or offset, but are instead automatically boxed.
Grammar and inference rules are given respectively in Figure~\ref{fig:nstar-common-expressions-labels-code-grammar} and Figure~\ref{fig:nstar-common-expressions-labels-code-typerules}.
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/exprs/label-value-grammar}
}
}
\caption{Grammar for code labels in \nstar.}
\label{fig:nstar-common-expressions-labels-code-grammar}
\end{figure}
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\hypo{l : \Tforall{\vec{v}}{\Tcontext{\chi}{\sigma}{\epsilon}} \in \Xi_c}
\hypo{\KindSequent{\Gamma}{s_i}{k_i}}
\infer[no rule, rule margin=1.0ex]2{\vec{s} = \{ s_1 : k_1, s_2 : k_2, \ldots, s_p : k_p \}\hspace{1.5em}%
\vec{s}\sim\vec{v}\hspace{1.5em}}
\infer1{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{l\Tunion{\vec{s}}}{\Tforall{}{\Tcontext{\chi[\vec{v}\setminus\vec{s}]}{\sigma[\vec{v}\setminus\vec{s}]}{\epsilon[\vec{v}\setminus\vec{s}]}}}}
\end{prooftree}
}
\caption{Type inference rules for code labels in \nstar.}
\label{fig:nstar-common-expressions-labels-code-typerules}
\end{figure}
\subsection{Registers}\label{subsec:nstar-common-expressions-registers}
Registers in \nstar\ are abstracted away from the target architectures.
Instead of manipulating registers \texttt{\%eax} or \texttt{\%rsp} or \texttt{\$a2}, which all are platform-specific, there are 6 general purpose registers in \nstar, named \texttt{r0} to \texttt{r5}.
They can be 4 to 8 bytes large depending on the target architecture mappings.
Grammar for all those registers is given in Figure~\ref{fig:nstar-common-expressions-registers-grammar} and inference rules are given in Figure~\ref{fig:nstar-common-expressions-registers-typerules}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/exprs/register-grammar}
}
}
\caption{Grammar for registers in \nstar.}
\label{fig:nstar-common-expressions-registers-grammar}
\end{figure}
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\begin{prooftree}
\infer0{\CodeTypeSequent{\Xi}{\Gamma}{\chi, r : t}{\sigma}{\epsilon}{r}{t}}
\end{prooftree}
}
\caption{Type inference rules for registers in \nstar}
\label{fig:nstar-common-expressions-registers-typerules}
\end{figure}
\subsection{Pointer offsets}\label{fig:nstar-common-expressions-pointeroffsets}
Pointer offsets allow accessing data that may be located relative to a given memory address.
Grammar for pointer offsets is given in Figure~\ref{fig:nstar-common-expressions-pointeroffsets-grammar} and type inference rules are given in Figure~\ref{fig:nstar-common-expressions-pointeroffsets-typerules}.
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.45}{
\includegraphics{nstar/exprs/pointer-byte-offset-grammar}
}
\\
\scalebox{.5}{
\includegraphics{nstar/exprs/pointer-offset-grammar}
}
}}
\caption{Grammar for pointer offsets.}
\label{fig:nstar-common-expressions-pointeroffsets-grammar}
\end{figure}
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\begin{prooftree}
\hypo{\KindSequent{\Gamma}{t}{k}}
\hypo{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{p}{\Tptr{t}}}
\hypo{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{o}{\Tsigned{64}}}
\infer3[raw pointer byte offset]{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{\Ebyteoff{o}{p}}{\Tptr{t}}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{\KindSequent{\Gamma}{t}{k}}
\hypo{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{p}{\Tptr{t}}}
\hypo{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{o}{\Tsigned{64}}}
\infer3[raw pointer base offset]{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{\Ebaseoff{o}{p}}{\Tptr{t}}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{\KindSequent{\Gamma}{t_i}{\T{n$_i$}}}
\hypo{n_j \in \mathbb{N}}
\hypo{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{p}{\Tptr{\Tstruct{t_0}{t_1}{\ldots}{t_m}}}}
\infer3[structure pointer offset]{\CodeTypeSequent{\Xi}{\Gamma}{\chi}{\sigma}{\epsilon}{\Ebaseoff{p}{i}}{\Tptr{t_i}}}
\end{prooftree}
}}
\caption{Type inference rules for pointer offsets.}
\label{fig:nstar-common-expressions-pointeroffsets-typerules}
\end{figure}
Although offsetting a pointer works with any expression as the offset (as long as it is a signed integer), if the offset is a register, it is considered an unsafe operation\footnote{See Section~\ref{subsec:nstar-common-unsafe-ptroffsetreg}~``\nameref{subsec:nstar-common-unsafe-ptroffsetreg}''}.
\section{File sections}\label{sec:nstar-common-sections}
Sections in \nstar\ serve the exact same purpose as in other assembly languages. They divide a file into multiple parts depending on what the semantics of the current section is supposed to be (code, data, etc).
Section names obviously differ from one target format to another. As an example, the ``\texttt{.rela.dyn}'' section from the ELF format may not exist in the PE format.
\nstar\ tries to unify target formats section names (simplifying targetting \nstar\ as a compiler backend) by having a fixed set of section names, all with different meanings. While you can put anything anywhere in classical assembly languages, this is not the case in \nstar.
Sections in \nstar\ can be named ``\texttt{data}'', ``\texttt{code}'', ``\texttt{rodata}'', ``\texttt{udata}'' or ``\texttt{extern}''. Each of them has defined semantics as described below.
There is also a special \texttt{include} section describing whih file to include in the scope of the current one.
\subsection{The \texttt{code} section}\label{subsec:nstar-common-sections-code}
The \texttt{code} section is the section containing all executable instructions (basically, as the name implies, code).
Its syntax is defined in Figure~\ref{fig:nstar-common-sections-code-grammar}.
Each label is assigned a type, describing the context needed to branch to it.
If a label has the type \texttt{\{reg:s64\}} then there needs to be the register \texttt{reg} bound to a value of type \texttt{s64} in the current context, in order to branch to it.
However, some restrictions apply to labels in order to make the type-checking meaningful (or at least handle some aspects that cannot be handled with types only). See more about that in Section~\ref{sec:nstar-common-bs-restrictions}~``\nameref{sec:nstar-common-bs-restrictions}''.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.5}{
\includegraphics{nstar/sections/code-1}
}
\\
\scalebox{.5}{
\includegraphics{nstar/sections/code-2}
}
\\
\scalebox{.5}{
\includegraphics{nstar/sections/code-3}
}
}}
\caption{Grammar for \texttt{code} sections.}
\label{fig:nstar-common-sections-code-grammar}
\end{figure}
Every assembly instruction makes the current context vary in some way, either by binding registers, forgetting about some bindings, changing register types, or some other way. The current context is just a record keeping track of the currently bound registers, along with the data type they contain.
More on that in sections about instructions\footnote{Instructions are platform-specific, that's why we don't talk about them here.}.
Every label can also be given an optional \texttt{global} qualifier, which is used to indicate that the label can be used outside of the file (it can be thought of as ``exported'').
\subsection{The \texttt{data}, \texttt{rodata} and \texttt{udata} sections}\label{subsec:nstar-common-sections-data}
The \texttt{data} and \texttt{rodata} sections are sections used to reference (read-only) literal data (much like constants in PHP).
Each label in those sections is used as a pointer to the given value, but only the labels in the \texttt{data} section can be written to\footnote{The \texttt{rodata} section (read-only data) is read-only. While it is not necessary to prevent writing to it at compile-time, it is an undefined behavior at runtime.}.
The grammar of those sections is given in Figure~\ref{fig:nstar-common-sections-data-grammar}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.5}{
\includegraphics{nstar/sections/data-1}
}\\
\scalebox{.5}{
\includegraphics{nstar/sections/data-2}
}\\
\scalebox{.5}{
\includegraphics{nstar/sections/data-3}
}\\
\scalebox{.5}{
\includegraphics{nstar/sections/data-4}
}
}}
\caption{Grammar for the \texttt{data}, \texttt{rodata} and \texttt{udata} sections.}
\label{fig:nstar-common-sections-data-grammar}
\end{figure}
\noindent Note that, while the grammar allows labels in the \texttt{data} section to be pointers to context types, this cannot be typed because there is no such constant value referencing a code-space address.
While the \texttt{data} and \texttt{rodata} sections contain initialised values (that is, hardcoded bytes in the executable), the \texttt{udata} section does not contain any byte (it should be only $0$s, to reserve space for each entry).
Each label can also be marked with a \texttt{global} qualifier, which acts the same as in the \texttt{code} section.
\subsection{The \texttt{extern} sections}\label{subsec:nstar-common-sections-extern}
The \texttt{extern} sections contain information related to static and dynamic imports.
They come in 2 flavors: function imports or data imports.
\subsubsection{The \texttt{extern.code} section}\label{subsubsec:nstar-common-sections-extern-code}
The \texttt{extern.code} section indicates that there are functions which, despite not being in the source file, should be accessible with the given context.
Static imports are resolved during linking, preventing from using undefined function symbols at runtime, while dynamic imports are resolved at runtime by the dynamic linker.
The two types of imports are indicated from the absence (or presence) of the \texttt{dyn} qualifier when importing.
The grammar showing this is given in figure~\ref{fig:nstar-common-sections-extern-code-grammar}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.47}{
\includegraphics{nstar/sections/externcode-grammar}
}
}
\caption{Grammar for the \texttt{extern.code} file section.}
\label{fig:nstar-common-sections-extern-code-grammar}
\end{figure}
\subsubsection{The \texttt{extern.data} and \texttt{extern.rodata} sections}\label{subsubsec:nstar-common-sections-extern-data}
The \texttt{extern.data} and \texttt{extern.rodata} sections are used to import data addresses into the current scope.
While it is possible to dynamically import a data address, it shouldn't really be useful (unless in specific cases like \texttt{static} variables).
Same as for the \texttt{extern.code} section, the import type (dynamic or static) is given depending on the absence (or presence) of the \texttt{dyn} qualifier.
The grammar for this section is given in Figure~\ref{fig:nstar-common-sections-extern-data-grammar}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.45}{
\includegraphics{nstar/sections/externdata-grammar}
}
}
\caption{Grammar for the \texttt{extern.data} and \texttt{extern.rodata} sections.}
\label{fig:nstar-common-sections-extern-data-grammar}
\end{figure}
\subsection{The special \texttt{include} section}\label{subsec:nstar-common-sections-include}
The \texttt{include} section, in its simplest form, is used to bring the scope of another file into the current one (the one containing the section).
This is very useful when needing to include interface files (files containing just an \texttt{extern.code} section, providing an interface for a specific library) or even when wanting to split a project into multiple parts.
Operationally, the \texttt{include} section behaves as if it was replaced with the content of every file in it (modulo cyclic includes, which must be forbidden).
The grammar is described in Figure~\ref{fig:nstar-common-sections-include-grammar}.
\begin{figure}[htb]
\centering
\framebox[\textwidth][c]{
\scalebox{.5}{
\includegraphics{nstar/sections/include}
}
}
\caption{Grammar for the special \texttt{include} section.}
\label{fig:nstar-common-sections-include-grammar}
\end{figure}
\section{Constant values}\label{sec:nstar-common-constvalue}
Constant values are some kind of values that can be used in the \texttt{data} and \texttt{rodata} sections\footnote{See Section~\ref{subsec:nstar-common-sections-data}~``\nameref{subsec:nstar-common-sections-data}''.}.
They come in 4 flavors: integers, characters, arrays and strings (which are only arrays of characters).
Note that strings are automatically null-terminated (i.e.\ a \texttt{\textbackslash 0} always lies at the end).
Their grammar is given in Figure~\ref{fig:nstar-common-constvalue-grammar} and the type inference rules are given in Figure~\ref{fig:nstar-common-constvalue-toplevel-typerules} and Figure~\ref{fig:nstar-common-constvalue-embedded-typerules}.
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\centering
\scalebox{.5}{
\includegraphics{nstar/constants/string-character}
}
\\
\scalebox{.5}{
\includegraphics{nstar/constants/integer-grammar}
}
\\
\scalebox{.5}{
\includegraphics{nstar/constants/character-grammar}
}
\\
\scalebox{.5}{
\includegraphics{nstar/constants/array-grammar}
}
\\
\scalebox{.5}{
\includegraphics{nstar/constants/string-grammar}
}
\\
\scalebox{.5}{
\includegraphics{nstar/constants/struct-grammar}
}
}}
\caption{Grammar for constant values.}
\label{fig:nstar-common-constvalue-grammar}
\end{figure}
\underline{Note:} inference rules for constant values are divided into two parts, depending on whether they are ``top-level'' constants or not.
A ``top-level'' constant is a constant that is found directly after the \texttt{=} symbol in a \texttt{data} binding.
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\hspace{1.5em}\underline{Top-level constants:}\\
\begin{center}
\begin{prooftree}
\hypo{d$ is a signed integer constant$}
\hypo{t \in \{ \Tsigned{8}, \Tsigned{16}, \Tsigned{32}, \Tsigned{64} \}}
\infer2{\DataTypeSequent{\Gamma}{d}{\Tptr{t}}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{d$ is an unsigned integer constant$}
\hypo{t \in \{ \Tunsigned{8}, \Tunsigned{16}, \Tunsigned{32}, \Tunsigned{64} \}}
\infer2{\DataTypeSequent{\Gamma}{d}{\Tptr{t}}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{n_i \in \mathbb{N}}
\hypo{\KindSequent{\Gamma}{t_i}{\T{n$_i$}}}
\hypo{\DataTypeSequent{\Gamma}{c_i}{t_i}}
\infer3{\DataTypeSequent{\Gamma}{\Estruct{c_0}{c_1}{\ldots}{c_p}}{\Tptr{\Tstruct{t_0}{t_1}{\ldots}{t_p}}}}
\end{prooftree}\hspace{1.5em}
\begin{prooftree}
\hypo{c$ is a character constant$}
\infer1{\DataTypeSequent{\Gamma}{c}{\Tptr{\Tchar}}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{n \in \mathbb{N}}
\hypo{\KindSequent{\Gamma}{t}{\T{n}}}
\hypo{\DataTypeSequent{\Gamma}{c_i}{t}}
\infer3{\DataTypeSequent{\Gamma}{\Earray{c_0}{c_1}{\ldots}{c_n}}{\Tptr{t}}}
\end{prooftree}\hspace{1.5em}
\begin{prooftree}
\hypo{s$ is a string constant$}
\infer1{\DataTypeSequent{\Gamma}{s}{\Tptr{\Tchar}}}
\end{prooftree}
\end{center}
}}
\caption{Type inference rules for constant values.}
\label{fig:nstar-common-constvalue-toplevel-typerules}
\end{figure}
\begin{figure}[H]
\centering
\framebox[\textwidth][c]{\parbox{\textwidth}{
\hspace{1.5em}\underline{Embedded constants:}\\
\begin{center}
\begin{prooftree}
\hypo{d$ is a signed integer constant$}
\hypo{t \in \{ \Tsigned{8}, \Tsigned{16}, \Tsigned{32}, \Tsigned{64} \}}
\infer2{\DataTypeSequent{\Gamma}{d}{t}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{d$ is an unsigned integer constant$}
\hypo{t \in \{ \Tunsigned{8}, \Tunsigned{16}, \Tunsigned{32}, \Tunsigned{64} \}}
\infer2{\DataTypeSequent{\Gamma}{d}{t}}
\end{prooftree}
\\\vspace{\baselineskip}
\begin{prooftree}
\hypo{c$ is a character constant$}
\infer1{\DataTypeSequent{\Gamma}{c}{\Tchar}}
\end{prooftree}\hspace{1.5em}
\begin{prooftree}
\hypo{n_i \in \mathbb{N}}
\hypo{\KindSequent{\Gamma}{t_i}{\T{n$_i$}}}
\hypo{\DataTypeSequent{\Gamma}{c_i}{t_i}}
\infer3{\DataTypeSequent{\Gamma}{\Estruct{c_0}{c_1}{\ldots}{c_p}}{\Tstruct{t_0}{t_1}{\ldots}{t_p}}}
\end{prooftree}
\end{center}
}}
\caption{Type inference rules for constant values.}
\label{fig:nstar-common-constvalue-embedded-typerules}
\end{figure}
\section{Restrictions applied to branching}\label{sec:nstar-common-bs-restrictions}
Branching ((un-)\ conditional jumps, calls, etc) is restricted in \nstar\ in order to prevent stack leaks, unknown caller address return or even missing return values.
It also is restricted not to integrate some sort of scoping, which would add some (mostly useless) complexity when using/generating \nstar.
\subsection{Returning to a known code-space address}\label{subsec:nstar-common-bs-restrictions-ret}
\nstar\ offers what we call ``continuations'', which are a mean of easily controlling the instruction control flow of the program.
Thanks to that, this is actually easy to check whether we would be returning to some valid piece of code: the continuation part of th context must be bound to something of the form $\Tforall{}{\zeta}$.
When anything else is in the continuation (in fact, it cannot be, because we do not allow overwriting the continuation --- which can be worked around by moving the continuation somewhere else before), it is strictly impossible to either {\Iformat call} or {\Iformat jmp} to a label, or {\Iformat ret} from a piece of code, which would yield a very nasty undefined behavior at runtime in mainstream assembly languages.
\subsection{Missing return values (a.k.a.\ registers not bound before return)}\label{subsec:nstar-common-bs-restrictions-unboundregs}
Along with checking that we are returning to a valid code-space address, we can also, thanks to context types, determine if returning from a function preserves the value in registers, and if after returning we have access to return values of a function.
Consider the example given in Listing~\ref{lst:nstar-common-bs-returnvalues}.
When {\Iformat ret}urning from the \texttt{example} ``function'', we give access to the values of type \texttt{u64} in the \texttt{\%r0} and \texttt{\%r3} registers.
Note that if either of those registers were not bound before (not given as parameters, or bound using {\Iformat mv}), type-checking \textit{must} fail to prevent undefined behavior when later accessing the ``value'' in either register.
We can easily see that the return context contains \texttt{\%r0: u64} and \texttt{\%r3: u64}, which means that both registers must have been bound before, and are accessible for reading later on, after the {\Iformat call} that went until here.
This gives a safe way of using return values, and guaranteeing that there are indeed return values stored in the target registers.
\begin{listing}[htb]
\centering
\inputminted{\nstarlexer}{examples/return-multiple-values.nst}
\caption{An example of returning multiple values from a simple function.}
\label{lst:nstar-common-bs-returnvalues}
\end{listing}
\section{Unsafe operations}\label{sec:nstar-common-unsafe}
Unsafe operations are operations (sequence of zero or more instructions) whose safety cannot be determined from the current context, or whose evaluation cannot be determined as not UB-provoking (that is, the evaluation may put the application in an unknown state).
Those kind of operations are essential to low-level programming, but we want to put an emphasis on them by separating them from the rest of the normal and safe operations using \texttt{unsafe} blocks.
\subsection{Dereferencing literal addresses}\label{subsec:nstar-common-unsafe-derefliteraladdr}
Sometimes, one needs to be able to read from/write to literal addresses, such as \texttt{0xB8000} in kernel development (this is the console text buffer beginning address, used to output text to the basic 80×25 console).
However, because you cannot guarantee that 1- the memory address exists on every machine the program is supposed to run on and 2- there always are some reconstitutable pieces of data of the specified type at the given address, those two operations are considered unsafe.
Be aware that, despite not being restricted, trying to write to a protected memory address yields an undefined behavior.
Also, if used, the code is no-longer considered platform-independendant\footnote{Platform independant code (PIC for short) is a type of code which should be runnable on all machines, no matter their specifications.}.
\subsection{Pointer offsetting}\label{subsec:nstar-common-unsafe-ptroffset}
Stack pointers, because of their type, can be safely offset to point to a valid piece of data.
However, ``normal'' pointers (for example \texttt{*u64}) cannot be safely offset.
While a stack pointers describes the entire structure of the stack (or at least a piece of it), a regular pointer only describes the piece of data it points to.
This is a common idiom in C to use pointers to represent arrays in a contiguous memory (so an array of 6 integers would basically be a $6 * sizeof(int)$ bytes wide chunk of memory, where each index points to a different integer).
\texttt{Vec<T>} in Rust, \texttt{std::vector<T>} in C++, \texttt{ArrayList<T>} in Java, and many other vector types are also implemented in terms of a pointer to a chunk of memory, but with an added container size, allowing to safely access elements of the vector without going out of bounds.
But in Rust, we still need to have some \texttt{unsafe} blocks in your code, in order to use the container.
There is the same problem in \nstar.
Because there is no built-in array type, we have to rely on pointers to be able to achieve such thing, therefore needing a way to offset a pointer to access the various elements in the array.
It would also be really hard to manipulate plain array types (because, for example, of the size to store with the type).
As offsetting a pointer can lead to an invalid address dereferencing (or at least dereferencing an address which doesn't belong to the application memory, i.e.\ allocated by another process), it is considered an unsafe operation and therefore needs to be wrapped in an \texttt{unsafe} block.
\subsection{Pointer offsetting using a register}\label{subsec:nstar-common-unsafe-ptroffsetreg}