Notes Week 1-12
Notes Week 1-12
what is an app?
computer software or a program most commonly a small specific one used for mobile devices. The term app originally refered to
any mobile or desktop application, but as more app stores have emerged to sell mobile apps to smartphone and tablet users, the
term app has evolved to refer to small programs that can be downloaded and installed at once - techopedia
a program package to solve a problem
example - firefox, instagram, vs code, amazon, chrome, safari, twitter, word, terminal emulator etc
desktop applications -
usually standalone, editors, word processors, web browser, mail etc
often work offline, local data storage, possible network connection
software development kits (SDK) - custom frameworks, OS specific
mobile apps -
targetted at mobile platforms - phone, tablets
constraints -
limited screen space
user interaction (touch, audio, camera)
memory processing power
battery
framework -
OS specific (android studio, xcode)
cross-platform (flutter, react native)
Network -
usually network oriented
cocoa touch - apple specific framework
web apps -
the platform
works across OS device, create a common base
heavily network oriented, mostly cant work without network, but possible
workarounds for offline processing
main focus of this course
components of an app
storage
computation
presentation
compute
presentation
platforms:
desktop
touch screen
voice, tilt, camera interfaces
small self-containted apps
web-based
embedded
architectures
client server
peer to peer
client server
server
stores data
provides data on demand
may perform computations
clients
end users
request data
user interaction, display
network
explicit servers
explicit clients
local systems -
both client and server are on same machine - local network comunication
conceptually still a networked system
machine clients -
eg software / antivirus updaters
need not have user interaction
variants-
multiple servers, single queue, multiple queues, load balancing frontends, etc
examples:
email
databases
whatsapp/messaging
web browsing
Notes
CS model may be bottle-necked by high traffic at server
P2P harder to choke
P2P can be fast if P is near to us
CS can fail if Server fails, P2P survives some failure
P2P is expensive to maintain
P2P is good for redundancy and data safety
Separation of Concerns -
In computer science, separation of concerns is a design principle for separating a computer program into distinct sections.
Each section addresses a separate concern, a set of information that affects the code of a computer program.
fundamental structure of a software and rule of creating such structures and systems such that separation of concerns divides up
the software into logical parts which are independent and provide an interface to each other, making development, testing,
debugging, etc easier
the layers are loosely coupled, they dont heavily depend on each other
layered architecture
The layered architecture style is one of the most common architectural styles. The idea behind Layered Architecture is that modules or
components with similar functionalities are organized into horizontal layers. As a result, each layer performs a specific role within the
application.
design pattern -
a general reusable solution to a commonly occuring problem within a given context in software design
Model - core data to be stored for the application - stored and models the data
databases, indexing for easy searching, manipulation
View - user facing side of application - the UI and UX
interfaces for finding information, manipulating
Controller - business logic - how to manipulate data - brain of program, connects model and view
User uses controller -> to manipulate the model -> that updates the view -> that user sees
Important Notes:
view layer also called presentation layer in MVP
controller layer controls business logic of code, so can be called business layer
in MVC, one model can have multiple views
view may not be visual
The controller can interact with the model and view
history
telephones are circuit switched - allow A to talk to B by having a physical connection between them (complex switching network)
physical wires tied up for duration of call even if nothing is said
so packet switching invented - here msg/data is broken into small packets and each packet contains its metadata (src, dest, etc)
and is routed through common communication channels
wire occupied only when data to be sent
data instead of analog voice
usage of hub-and-spoke model instead of mesh network, data multiplexed through one or more central wires, wires across all
nodes not needed
network is neutral to type of data
IBM SNA, Digital DECNet, Xerox Ethernet, ARPANET (Internet) etc
As so many standards are there, we need protocols for intercommunications
protocols
how to format packets; place them on wires; headers/checksums etc
each network had its own protocol
can we create inter-network?
how to communicate between different network protocols ?
or replace with a single internet protocol?
IP: internet protocol - 1983
define headers, packet types, interpretation
can be carried over different underlying networks: etherenet, DECnet, PPP, SLIP
TCP - Transmission Control Protocol - 1983
establsih a reliable communication - retry, error control, etc
automatically scale and adjust to network limits
it kind of creates a 'circuit switch' on top of a packet switch network
it moderates send speed etc according to link capacity
Thus TCP/IP is used in internet
Domian Names - 1985
use names instead of IP addresses
easy to remember - .com revolution still in the future
HyperText - 1989
Text documents to be served
formatting hints inside document to link to other documents (hypertext)
by tim berners lee at CERN (switzerland)
present
original web was limit
static pages
compliacted executable interfaces
limited styling
browser compatibility issues
NOW:
Notes
TCP is connection oriented (a connection (virtual) is needed before communication)
UDP is connectionless, it just sends the packets, doesn't care about reliablity
UDP can result in loss of data
TCP requires acknowledgement after receiving data
TCP/IP is a session initiation protocol
ARPANET - advanced research projects agency network
protocol - a set of rules that defines how the data packets are formed and placed on wires is called protocol
IP bridges different network protocols and and defines a standardized header for all network protocols
internet is network of networks that connects all devices on earth to each other
WWW uses internet to showcase webpages (https etc) to users. WWW is collection of webpages
hypertext transfer protocol - largely text based - client sends request, server responds with hypertext document. nowadays used for lot
more than just sending HT documents.
Notes
FTP - file transfer protocol
HTTP is stateless protocol, it sends request and server responds as per given state
FTP is stateful protocol - client sends a request to server and expects some response, if it doesnt get a response, it re-sends the
request
Stateless - HTTP, UDP, DNS
Stateful - FTP, Telnet
in stateless the C and S are loosely coupled, in stateful the server and client are tightly bound
stateless is easier to design the server and is faster than stateful
HTTP uses port 80
FTP uses port 21
Examples of web server - Apache Web Server, Nginx, Boa Webserver, FoxServ, Lighttpd, Microsoft Web Server IIS, Savant,
mongoose
Internet is interconnection of networks, connecting devices to each other.
WWW is collection of resources on the internet, like webpages etc.
WWW is browsed using the internet, but internet can also be used for other tasks, like IoT, FTP, etc
two newlines is how HTTP 1.1 header and data are separated
date is just a content of the http server
netcat listens on localhost 1500 and sends the 200OK to the port
to send request we use curl
curl http://localhost:1500
Note: use open-bsd netcat, not gnu-netcat. gnu-netcat doesn't produce expected behaviour.
The server is listening on a fixed port 1500
On incoming request, run some code and return result
Standard headers to be sent as part of result
Output can be text or other format - MIME (Multipurpose Internet Mail Extentions)
Typical Request
GET / HTTP/1.1
Host: localhost:1500
User-Agent: curl/7.64.1
Accept: */*
Notes
Accept / means client is willing to accept any form of data (MIMEtype)
Loopback Devices: a special, virtual network interface that your computer uses to communicate with itself, it is used mainly for
diagnostics and troubleshooting and to connect to servers running on the local machine
all IPs in 127.0.0.0/8 subnet are loopback devices
that means, 127.0.0.1 to 127.255.255.254 all represent your computer
mostly 127.0.0.1 is used, and has the hostname of localhost mapped to it
127.0.0.1 is represented as ::1 in IPv6
0.0.0.0 is a non-routable address. The computer doesn't try to route that address to anywhere, indicates an invalid, unknown, or
inapplicable end-user address
it is represented in ipv6 as :: or ::0 or ::/0
CGI - Common Gateway Interface - an interface specification that enables web servers to execute an external program, typically
to process user requests. Such programs are often written in a scripting language and are commonly referred to as CGI scripts,
but they may include compiled programs.
what is protocol
Both sides agree on how to talk
Server expects requests - nature of requests, nature of clients, types of results clients can deal with etc
Client expects responses - ask server for something, convey what you can accept, read result and process
HTTP
HTTP is a type of protcol, primariy text based
requests specified as GET POST PUT etc
headers can be used to convey acceptable response types, languages, encoding ,etc
which host to connect to if multiple hosts on single server
response headers also in text, conveys message type, data, cache information, status codes example 200 OK, 404 Not Found, etc
Server errors -
300 - warnings, not errors
400 - user errors, wrong url etc
500 - server error - example server crashes
HTTP Actions-
GET - simple requests, queries
POST- more complex form data, large text blocks, file uploads, etc
PUT / DELETE - rarely used in web 1.0, extensively used in web 2.0, basic of most APIs - REST, CRUD
Notes
Performance
how fast can a site be?
what limits performance
basic observations
Latency
Speed of light is 3e8 m/s in vacuum, 2e8 m/s in cable
Therefore min possible latency is 5 ns / m = 5 ms/1000 km
If data center is 2000km away, one way request takes 10 milliseconds, round trip takes 20ms
So we are limited by 50 requests/second
Response Size
Response = 1KB of text (headers, html, css, js, etc)
If network connection -> 100Mb/s = 100/8 MBytes/s
Then 10,000 requests/second limit
Google homepage is approx 150 KB
Memory
simple HTTP server (python) consumes ~ 6mb
multiple parallel connections can take lots of memory
2016 presidential debate had 2 million views on youtube, 12 TB RAM needed approx
Notes
RTT - Round Trip Time - Time taken for round trip of request and response
Internet Protocol
IP
has versions
example IPv4 (32 bits)
IPv6 (128 bits)
IPv4
IPv4 is stored in dotted-decimal format, where each octet is represented in its decimal form, and octets are separated by a dot.
example:
192.168.0.1
etc.
Each octet can have only numerical values in the range [0-255]
IPv6
IPv6 has 128 bits. It is repesented in hexadecimal. Each hexadecimal digit represents 4 binary digits.
IPv6 has groups of 4 hexadecimal digits (4 ∗ 4 = 16bits). There are 8 such groups.
∴ 4 ∗ 4 ∗ 8 = 128bits
hextets
hexadectets
quibble
quad-nibble
Shortening of IPv6
For convenience and clarity, the representation of an IPv6 address may be shortened with the following rules.
One or more leading zeros from any group of hexadecimal digits are removed, which is usually done to all of the leading zeros. For
example, the group 0042 is converted to 42.
Consecutive sections of zeros are replaced with two colons (::). This may only be used once in an address, as multiple use would
render the address indeterminate. RFC 5952 requires that a double colon not be used to denote an omitted single section of
zeros.
Port Numbers
In computer networking, a port is a communication endpoint. At the software level, within an operating system, a port is a
logical construct that identifies a specific process or a type of network service. A port is identified for each transport
protocol and address combination by a 16-bit unsigned number, known as the port number. The most common transport protocols
that use port numbers are the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP).
A port number is always associated with an IP address of a host and the type of transport protocol used for communication. It completes
the destination or origination network address of a message. Specific port numbers are reserved to identify specific services so that an
arriving packet can be easily forwarded to a running application. For this purpose, port numbers lower than 1024 identify the historically
most commonly used services and are called the well-known port numbers. Higher-numbered ports are available for general use by
applications and are known as [ephemeral ports]
Number of ports
there are 65535 ports in a computer
HTML structure
HTML is an XML document. We denote that it is HTML using <!DOCTYPE> tag.
<!DOCTYPE HTML>
Example:
<!DOCTYPE HTML>
<html>
<head>
<title>Test Document</title>
</head>
<body>
<h1>Hello World!</h1>
</body>
</html>
Markup Tags
HTML is made with tags, some tags give information about the document, while some tags are helpful for marking up the document. Some
examples are:
u for underline
b or strong for bold
i or em for italics
a or anchor tag for hypertexts
sub for subtext
sup for supertext
div a division tag - no visual value but used to group parts of documents
p paragraph tag, creates a new paragraph
anchor tags
target="_blank" to open page in new page
_self same frame
_parent parent frame
_top topmost frame of this page
framename in the provided name of the frame
We can also link to parts inside the document using id attribute to any tag and then putting the id in the href of the a tag with a prefixed #
sign.
img tag
<img src="" />
Information Representation
Information Representation
Raw data vs semantics
lopgical structure vs styling
html5 and CSS
Information Representation
Computer works with only bits - 0 or 1 (binary digits)
Numbers - binary
Binary Numbers -> 6 = 0110
Two's Complement for negative numbers, eg -> -6 = 1010
Letters -> A Letter to number correlation is pre-decided upon and then numbers are used
Representing Text
ASCII
Unicode
UTF-8
Encoding - converting text / data into a stream of bits following some predefined conventions which can be used to decode the bits into
the actual data again.
01000001 can be :
string of bits
number 65 in decimal
character A
It depends on the context and interpretation.
ASCII
American Standard Code for Information Interchange
7 bits code - 128 entities
a-z, A-Z, 0-9, special characters
Only latin characters so 7 bits enough
Didn't have any other language scripts or symbols
Unicode
16 bit code that has all the symbols of all the languages in the world. This had 65.536k entities (UCS-2) (2 bytes)
32 bits (4 bytes) code called UCS-4 that has 4 Billion+ characters, out of this only 100,000 are defined as of now
Notes
Ascii decimal value of space = 32
Ascii decimal value of capital letters -> letter number + 64
Ascii decimal value of small letters -> letter number + 64 + 32 (n + 96)
Efficiency
Most common language on web -> English
Should all characters be represented with same number of bits?
Example:
text document with 1000 words (5000 characters approx)
UCS-4 encoding -> 4bytes × 5000characters = 20kB
ASCII encoding -> 1byte × 5000characters = 5kB
Original 7 bit ASCII -> 7bits × 5000characters = 4.375kB
Optimal Coding based on frequency of occurance
'e' is most common, then 't', 'a', 'o' , etc
Huffman Tree coding or similar encoding -> 1-2 kB, possibly less
In general?
Prefix Coding
big codepoints are stored in utf-8 using prefix-coding
UTF-8
Use 8 bits for most common characters (ASCII)
All other characters can be encoded based on prefix encoding
More difficult for text processor-
First check prefix
Linked List through chain of prefixes possible
still more efficient for majority of documents
Most common encoding used today
Markup
A way to specify how to render the document. The style of the document, not the content.
Content vs Meaning
Types of Markup
XHTML
Content
Markup is a way of using cues or codes in the regular flow of text to indicate how text should be displayed.
Markup is very useful to make the display of text clear and easy to understand.
Types of Markup
WYSIWYG - what you see is what you get - directly format output and display
embed codes not part of regular text, specific to editor
Procedural
Details on how to display
eg-> change font to large, bold, skip 2 lines, etc
Descriptive - focus on what content means instead of how it looks
eg-> This is a <title>, this is a <heading>, this is a <paragraph>
Examples
MS Word, Google Docs, etc
user inteface focus on appearance and not meaning
WYSIWYG - direct control over styling
often leads to complex formatting and loss of inherent meaning
LaTeX, HTML, nroff, groff, troff
focus on meaning
more complex to write and edit
not WYSIWYG
Semantic Markup
Content vs Presentation
Semantics :
Meaning of the text
structure or logic of document
Notes
TeX, Nroff, Troff, Groff, PostScript -> Procedural Markup as you have to mention what to do
HTML, Markdown -> descriptive markup as you tell what the content is
HTML
first used by Tim Berners Lee at CERN
SGML -> Standard Generalized Markup Language
Strict definitions on syntax, structure, validity
HTML meant for browser interpretation
very forgiving -> loose validity checks
best effort to display
Tags
paired tags
< > are used for tags
closing tags have / before name
Location specific tag: <DOCTYPE> only at top of doc
Case insensitive
Some self-closing tags, they have format: <tagname/>
Presentation vs Semantics
strong vs b
strong is logical markup
b is presentational markup
History of HTML
SGML based
1989 HTML original
1995 HTML 2
1997 HTML 3, 4
XML (extensible markup language) based
XHTML -> 1997 to 2010
HTML5
first release 2008
W3C recommendation -> 2014
HTML5
block elements <div>
Inline elements <span>
Logical Elements <nav>, <footer>
Media: <audio>, <video>
Remove 'presentation only' tags like
<center>
<font>
Notes
List (like this) formed using ul / ol tags
ul -> unordered (bulleted lists)
ol -> ordered (numbered lists)
the style of the list (which bullet / numbering system to use) can be changed
Regardless of style of list, each list item is marked using li tag
Checkbox -> <input type="checkbox" />
Text field -> <input type="text" />
Alt attribute of img tag is shown when image cant be loaded, or for screen readers
Horizontal line (rule) can be created using hr tag
controls attribute in audio or video tag is used to show UI controls to play/pause/ change volume etc
© used to show copyright symbol
A reflow on an element recomputes the dimensions and position of the element, and it also triggers further reflows on that
element's children, ancestors and elements that appear after it in the DOM. Then it calls a final repaint.
Markup vs Style
Markup tells the logical structure of the document
Style tells how the document should look
Separation of Styling
Style hints in separate blocks
separate files included
Themes possible
Style sheets - specify presentation information
Cascading Style Sheets (CSS) -> allow multiple definitions, latest takes precedence
Notes
global selector in CSS * selects all tags
color can be provided by:
color name eg white, black, tomato
hexcode eg, #ffffff
rgb or rgba functions
comments in CSS use/* ... */ syntax
Responsive Website -> Adapt to changes in screen sizes
CSS precedence -> Inline > Internal > Extrenal
Inline CSS
Directly add style to a tag (scoped)
Example:
Internal CSS
Embed inside <head> tag
example:
<style>
body{
background-color: linen;
}
h1{
color: maroon;
margin-left: 40px;
}
</style>
External CSS
Extract common content for reuse
Multiple CSS files can be included
Latest definition of style takes precedence
Responsive Design
Mobile and Tablets have smaller screens
different form factors
adapt to screen - respond
CSS control styling - HTML controls content
Bootstrap
CSS framework, originated from twitter
standard styles for various components
buttons
forms
icons
mobile first: highly responsive layout
Javascript
interpretted language brought into the browser
not really related to java in any way - formally ECMAscript
programming ability inside website
not part of the core presentation requirements
Notes
CSS shorthand properties are properties which combine multiple properties into one. the value of the properties takes multiple
space separated values corresponding to each of the properties.
example -> border, margin, padding
For conflicting styles, the order in which the CSS files are loaded, the CSS styles are defined are all important
HTML attribute order is not important
The <thead> element is used in conjunction with the <tbody> and <tfoot> elements to specify each part of a table (header, body, footer).
Browsers can use these elements to enable scrolling of the table body independently of the header and footer. Also, when printing a large
table that spans multiple pages, these elements can enable the table header and footer to be printed at the top and bottom of each page.
Note: The <thead> element must have one or more <tr> tags inside.
The <thead> tag must be used in the following context: As a child of a <table> element, after any <caption> and <colgroup> elements, and
before any <tbody>, <tfoot>, and <tr> elements.
Tip: The <thead>, <tbody>, and<tfoot>elements will not affect the layout of the table by default. However, you can use CSS to style these
elements (see example below)!
Style
nth-child(even) - only even child elements
example:
tr:nth-child(even){
background-color: lightgray;
}
Relational Selectors
children (direct child)
descendant (child of child of ....)
Descendant
form input {
...
}
Child
Example:
form p::first-letter{}
input[x="y"]{
apply style to all input tags who has attribute x with value y
<input x="y">
References
Pseudo-Classes
Pseudo-Elements
https://getbootstrap.com/
student list
course list
student-course marks
Views:
Controller:
Notes:
Smalltalk-80 is dynamically typed, OO, programming language
View responsible for user interaction with application
Views
User interface
User interaction
User interface
screen
audio
vibration (haptics)
User interaction
keyboard / mouse
touchscreen
spoken voice
custom buttons
Types of Views
Fully static
Partly Dynamic (Wikipedia)
Mostly Dynamic (Amazon)
Output
HTML - most commonly used - direct rendering
Dynamic images
JSON/XML - machine readable
Systematic Process
functionality requirement gathering - what is needed
User and Task Analysis - user preference, tasks needs
Prototyping - wireframes, mockups
testing - user acceptance, usablity, accessibility
Guidelines / Heuristics
Jakob Nielsen's Heuristics for design
website
10 Guidelines:
There are two types of errors: slips and mistakes. Slips are unconscious errors caused by inattention. Mistakes are
conscious errors based on a mismatch between the user’s mental model and the design.
General Principles:
Consistency
Simple and minimal steps
Simple language
minimal and aesthetically pleasing
Tools
Wireframes
Visual guides to represent structure of web page
information design
navigation design
user interface design
lorem ipsum:
fake latin text that is only meant as a text placeholder to show how the text content would look without distracting the design by seeming
to be actual text.
LucidChart
Adobe XD
Figma
Example of pyhtml:
import pyhtml as h
t = h.html(
h.head(
h.title('Test Page')
),
h.body(
h.h1('This is a title'),
h.div('This is some text'),
h.div(h.h2('inside title'),
h.p('some text in a paragraph'))
)
)
print(t.render())
def f_table(ctx):
return (tr(
td(cell) for cell in row
) for row in ctx['table'])
Templates:
Standard template text
Placeholders / variables
basic /very limited programmability
examples:
python inbuilt string templates - good for simple tasks
jinja2 - used by flask
genshi
mako
Jinja
ties in closely with flask
template functionality with detailed API
templates can generate any output, not just HTML
Example:
will print:
Hello World!
My favourite numbers: 1 2 3 4 5 6 7 8 9
Accessibility
Various forms of disability or impairment
Vision
Speech
Touch
Sensor-Motor
Can a page be accessed by people with impairments?
How can the accessibility of a page be improved?
guidelines
Standards
Interplay between many components of a page
Principle - Perceivable
Provide text alternatives for non-text content
Provide captions and other alternatives for multimedia
Create content that can be presented in different ways, including by assistive technologies, without losing meaning.
Make it easier for users to see and hear content.
Principle - Operable
Make all functionality available from the keyboard
give users enough time to read and use content
do not use content that causes seizures or physical reactions
help users navigate and find content
make it easier to use inputs other than keyboard
Principle - Understandable
Make text readable and understandable
Make content appear and operate in predictable ways
Help users avoid and correct mistakes
Principle - Robust
Maximize compatibility with current and future user tools.
Examples:
Use aria-describedby attribute
Notes
{{ }} are variable interpolation
{% %} are blocks
{# #} are comments in jinja2
Jinja2
if __name__ == '__main__':
main()
specifiers:
+ for showing sign of number always
`a="this is {p:+}"
d for decimal value
x for hexadecimal value
Persistent Storage
Example: Gradebook
Spreadsheets
arbitrary data organized into rows and columns
operations defined on cells or ranges
multiple inter-linked sheets within single spreadsheet
Relationships?
student - course ?
separate entry with full details - student name, id, address, course id, name, department, etc ? NO
redundant
Create another table joining students and courses
only ID required
relation specified with keys
KEYS
used to uniquely identify elements
OBJECTS
class Student:
idnext = 0 # Class Variable
def __init__(self, name):
self.name = name
self.id = Student.idnext
Student.idnext = Student.idnext + 1
auto initialise ID to ensure unique
functions to set/get values
PERSISTENT STORAGE?
in memory data structures lost when server shut down or restarted
save to disk? structured data?
python pickle module (serialising data)
csv - comma separated values
tsv - tab separated values
Essentially same as spreadsheets - limited flexibility
ADVANTAGES:
DISADVANTAGES
RELATIONAL DATABASES
Data strored in tabular format
columns of tables: fields(name, address, departements, etc)
row of tables: individual entries (student1, student2, etc)
Relationships
joining two tables together using their unique ids -> expresses relationships between them
Types of Relationships
One to one
one student has one roll number
one roll number uniquely identifies one student
example: assign unique message-ID to each email in inbox
One to many ( many to one)
one student stays in only one hostel
one hostel has many students
example: save emails in folders, one email is in only one folder, but one folder has multiple emails
Many to many
one student can register for many courses
one course can have many students
example: assign labels to emails, one email can have multiple labels, and vice versa
Diagrams
Entity Relationship (ER) diagram
Unified Modeling Language (UML)
Class relation diagram
SQL
KEY
Primary Key - important for fast access on large databases, unique attribute which cannot be null
Foreign Key - connect to different table - Relationships
Queries
Retrieve data from database
eg- find all students with name beginning with A
find all courses offered in 2021
select s.name
from Students s
join StudentsCourses sc ON s.IDNumber = sc.studentID
join Courses c ON c.ID = sc.courseID
where c.name = 'Calculus'
Notes
Single line comment in SQL: -- this is a comment
TRUNCATE command deletes all data from table, but schema is preserved
DROP command drops the entire table along with data and schema
DROP cannot be rolled back
NOSQL databases dont have to adhere to ACID properties
LIKE keyword used to check likeness of strings in SQL, example:
here % refers to multiple characters, _ refers to single character
to find highest or lowest of something, sort by attribute, then limit 1
VALID JOINS-
INNER JOIN
FULL JOIN
LEFT JOIN
RIGHT JOIN
GROUP BY command used to group results using one (or more) attributes - group by
MVC origins
collection of design patterns
originally introduced in GUI design of smalltalk
many variations
requests and responses
example: dynamic web page
links: clickable to select various options
clicking a link triggers different behaviours
CRUD
CREATE
create a new entry
must not already exist
check within database to avoid conflicts
mention mandatory vs optional fields (name, address, mobile number,...)
READ
get a list of students
summarisse number of students, age distribution, geographic locations
plot histograms of marks
etc
UPDATE
change of address
update marks
change start date of course
DELETE
remove graduated students
delete mistaken entries
unenroll student from course
lifecycle of data
orginally in context of database operations, nothing to do with the web
reflects cycle of data models
databases optimized for various combinations of operations
read-heavy: lots of reading, very little writing or creating
write-heavy: security archive logs
Summary:
actions : interactions between view and model
controller: group actions together logically
api: complex set of capabilities of server
interaction through http requests
http verbs used to convey meanings
Rule of thumbs:
should be possible to change views without the model ever knowing
should be possible to change underlying storage of model without views ever knowing
controllers / actions should generally NEVER talkt to a database directly
In practise:
views and controllers tend to be more closely interlinked than with models
more about a way of thinking than a specific rule of design
Notes
URL is subset of URI
URN is subset of URI
Routing -
mapping urls to actions
Python decorators
add extra functionality on top of a function
"@" - decorators before function name
effectively function of a function that returns a function
take the inner function as an argument
return a function that does something before calling the inner function
@app.route('/')
def home():
return "hello world!"
@app.route('/') is decorator, which makes flask route the path '/' to that method
Notes
WSGI - web server gateway interface
The Web Server Gateway Interface is a simple calling convention for web servers to forward requests to web applications or
frameworks written in the Python programming language.
Werkzeug is a WSGI toolkit that implements requests, response objects, and utility functions.
Flask default port is 5000
while routing, functions need to have unique name, else flask throws assertion error
enctype is an attribute in html form elements, which speficies how to encode form data. only works if method is POST
API Design
Distributed Software Architecture
server and clients
standard protocols needed for communication
assumptions:
server always on?
server knows what client is doing?
client authentication
network latency
The web
client and server mostly are apart
different networks, latencies, quality
authentication not core part of protocol
state -
server does not know state of client
client does not know state of server
REST
roy fielding phd thesis
representational state transfer
take account limitation of web
provide guidelines or constraints
Constraints of REST:
client server architecture
stateless - server cannot assume state of client and vice versa
layered system
traffic goes through network to load balancer, auth server, backends, etc. server does not know how many layers and
what they are
response can be cached at any of those layers
cacheablity - response can be sent from cache
uniform interface -
client and server interact in a uniform and predictable manner
server exposes resources
(optional) code on demand
server can extend client functionality using javascript / java applets
REST
state information between client and server explicitly transferred with every communication
Sequence
client accesses a resource identifier from server
usually URI - superset of URL
typically start from home page of app
no initial state assumed
resource operation specified as part of access
if http then get, post, etc
not fundamentally tied to protocol
server responds with new resource identifier
new state of system, new links to follow, etc
HTTP
one possible protocol to carry REST messages
use the http verbs to indicate actions
standardize some types of functionality
GET: retrieve rep of target resources' state
POST: enclose data in request: target resource 'processes' it
PUT: create a target resource with data enclosed
DELETE: delete target resource
idempotent operations
repeated application of the operation is safe
example: GET as its read only
PUT is also idempotent as you can only put it once, next times may give error
DELETE (with id) is idempotent
POST is not idempotent
CRUD
crud is database operations
typically a common set of operations needed in most web apps
good candidate for REST based functionality
REST != CRUD but they do work well together
Data encoding
Basic HTML - for simple responses
XML - structured data response
JSON - simpler form of structured data
data serialisation for transferring complex data types over text based format
JSON
javascript object notation
nested arrays and objects
serialize complex data structures like dictionaries, arrays, etc
YAML
yet another markup language - common alternative, especially for documentation and configuration
OpenAPI
way of formalizing/standardizing API documentation
API
purpose: information hiding - neither server nor client should know details of implementation on other side
unbreakable contract - should not change - standardized
version may update with breaking changes
Documentation
highly subjective - some are better than others at documenting
incomplete - what one finds enough, other may find insufficient
outdated
human language specific
Description Files
machine readable - has very specific structure
eanable automated processing
boilerplate code
mock servers
Example: assembly language is a version of the programming language of computers that is both machine and human readable
structured so it can be compiled
versus: english language specification which needs someone to write code
BACKEND SYSTEMS
Memory Hierarchy
Type of storage elements:
on chip registers: 10s-100s of bytes, superfast
(static ram) SRAM (cache): 0.1 - 1MB , very fast
DRAM (dynamic ram): 0.1 - 10GB or much more , fast
Solid State disk (SSD) - flash : 1-100GB, non-volatile, medium
Magnetic Disk (HDD) : 0.1 - 10TB, non volatile, slow
optical, magnetic, holographic etc
Storage Parameters
Latency: time to reaad first value from a storage location (lower is better)
Register < SRAM < DRAM < SSD < HDD
Throughput: number of bytes/second that can be read (higher is better)
DRAM > SSD > HDD (registers and SRAM have limited capacity)
Density: number of bits stored per unit area / cost (higher is better)
HDD > SSD > DRAM > SRAM > registers
Computer Organization
cpu has as amny resgisters as possible
backed by l1,l2,l3 cache (sram)
backed by several GB of dram working memory
backed by ssd for high throughput
backed by hdd for high capacity
backed by long-term storage, backup
Cold Storage
backups and archives
huge amount of data
not ready very often
can tolerate high read latency
amazon galcier, google, azure cold/archive storage classes
high latency of retrieval : upto 48 hrs
very high durability
very low cost
developer must be aware of choices and what kind of database to choose for a given application
Data Search
big oh O() notation
used in study of algorithm complexity
rough approximation: order of magnitude, approximately, etc
O(1): constant, not depend on n
O(log n): logarithmic in input size
O(N)
O(Nⁱ) - polynomial
O(iⁿ) - exponential
Alternatives:
Binary search tree -> maintaining sorted order is easier: growth of tree
Self balancing binary trees -> BST can easily tilt to one side and grow downwards, Red-black, AVL trees, B trees, more complex
but reasonable
Hash tables
compute an index for an element O(1)
hope the index for each element is unique, difficult but doable using collision control techniques
Database Search
databases are mostly tabular
tables with many columns
want to search quickly on some columns
maintain INDEX of columns to search on
store a sorted version of column
needs columns to be comparable: integer, short string, date/time, etc
long text fields are not good for index
binary data not good
example: mysql database uses b-trees and hash indexes
Hash index:
only used in in-memory tables
only for equality comparisons, not range or comparison
does not help with ORDER BY
partial key prefixes cannot be used
but VERY fast
Query optimization
database specific
SQL vs NoSQL
SQL - structured query language
used to query databases that have structure
could also be used for CSV files, spreadsheets, etc
closely tied to RDBMS- relational databases
columns/ fields
tables of data hold relationships
all entries in a table MUST have same set of columns
tabular databases
efficient indexing possible - use specified columns
storage efficiency - prior knowledge of data size
NoSQL
started out as alternative to SQL
but SQL is just a query language, can be adapted fora y, including from a document store or graph
not-only-sql
additional query patterns for other types of data stores
ACID
tracactions- core principle of databases
ACID :
atomic
consistent
isolated
durable
Many NoSQL databases sacrifise some part of ACID (example: eventual consistency instead of consistency) for performance
but there can be ACID complaint NoSQL databases as well
data stored
in memory - fast, does not scale accorss machines
disk - different data structures, organization needed
Scaling
Replication and Redundancy
redundancy:
multiple copies of same data
often used in connection with backups, even if one fails, other survive
one copy is still the master
replication:
usually in context of performance
many not be for purpose of backup
multiple sources of same data, less chance of server overload
live replicaation requires careful design
BASE vs ACID
BASE : basically available, soft state, eventually consistent
eventual consistency instead of consistency - replicas can take time to reach consistent state
stress on high availability of data
Application specific
financial transactions - cannot afford even slightset inconsistency, only scale-up possible
typical web application - social networks, media, eventual consistency is acceptable
ecommerce - only the financial part needs to go to ACID DB
Security
non mvc app - can have direct SQL queries anywhere
MVC- only in controller, but any controller can trigger a DB query
dangers of queries:
sql injections :
parameters from html taken without validation
validation: are they valid text data, no special characters, other symbols, no punctuations or other invalid input,
are they the right kind of input (text, number, email, date)
validation must be done just before the database query - even if you have validation in HTML or javascript, not enough
direct http requests can be made with junk data
buffer overflows, input overflows - length of inputs, queries
server level issues, protocol implementations - use known servers with good track record of security, update all patches
possible outcomes:
loss of data - deletion
exposure of data (sensitive)
manipulation of data
HTTPS
secure sockets - secure communciation between client and sever
server certificate:
based on DNS, has been verified by some trusted third party
difficult to spoof
based on mathematical properties - ensure very low probablity of mistakes match
however:
only secures link for data transfer, does not perform validation or safety check
negative impact on caching of resources like static files
some overhead on performance
Frontend
mechanisms
user facing interface
general GUI application on desktop
browser based client
custom embedded interface
device/ OS specific controls and interfaces
web browser standardizations
common conventions among browsers on how to render, what to render
browser vs native
look and feel
API, interfaces, interactions
UI/UX
web applications
browser based applications: HTML + CSS +JS
html - what to show
css - how to show
js - interaction
frontend mechanisms:
how to generate html,css,js?
functional reuse, common frameworks,
server/ client load impliactions
security implications
client load
typical web browser: issue req, wait for resp, render HTML, wait for user input, most time spent here
let client do more, also allow more fancy interactions
client-side-scripting -
javascript the defacto standard
component frameworks allow reuse, complex interactions
serverside javascript -> nodejs
server side rendering is very flexible, easier to develop, less security issues on client, but
load on server is more, more security issues on server
tradeoff of clientside:
can combine well with static pages and less load on server but still dynamic but
more resources neede on client, potential security issues, data leakage
async updates
original web:
client sends request, server responds, client displays
for any update of page- new req sent from client to server, server has to respond with complete page html styling etc, client
renders that page again
potential issues -> server load -> lots of redundant data to be sent each time, server-rendering -> more work, slow updates->
load full page, rerender
async
update only part of the page -> load extra data in the background after the main page has been loaded and rendered
quick response on main page -> better user experience
request for update can ask for just minimal data to refresh part of a page
originally seen as AJAX, now many variants
core idea: refresh part of the document on async queries to server
DOM
document object model
programming interface for web documents
dom is an abstract model (tree structure) of the document
object oriented allows manipulation like known objects
tightly coupled with javascript in most cases -> can also be manipulated from other languages
page styling
CSS most popular
difficult in text, accesible browsers - but has many features to help even with those
proper separtion of HTML and styling gives best freedom to browser,user
interactivity
some form of client-side programablity needed
js most popular
can interact with basic HTML elements
can also be used independently to create more complex forms
performance of js depends on browser and choice of scripting engine
javascript engines:
chrome, chromium,brave,edge : v8
firefox: spidermonkey
safari, olderIE use their own
impact:
performance : v8 generally best at presnet
js standarization means difference in engine is less important
client load
js engines also use cliennt cpu power, complex page layout require computation
can also use GPU
potential to load cpu
machine clients
client may not always be human
machine end points - typically access APIs
embedded devices post sensor information to data collection sites
typically cannot handle js, only http endpoints
alternatives to js on web
runs python on web
WASM
web assembly
binary instruction format
targets a stack based VM
sandboxed with controlled asccess to APIs
executable format for web
handles high performance execution can translate graphics to OPENGL etc
emcripten
compiler framework compile C or C++ to WASM
potential for craeting high performance code that runs inside browser
native mode:
file system
phone sms
camera object detection etc
web payments
using API of js
CAPTCHA
problem - scripts that try to automate web pages
can generate large number of requests in short time - server load
railway tatkal, cowin etc
solution:
Sandboxing
secure area that JS engine has access to , cannot access files, network resources, local storage
similar to VM
Security
Access Control
access -> being able to read/write/modify information
not all parts of application for public access like personal data, financial, etc
type of access -> readonly, read-write, modify but not create, etc
examples:
linux file system -> owner, group -> access your own files, cannot modify or even read others
can be changed by owner
root or admin or superuser has power to change permissions
email -> you can read your own email
can forward an email to someone else, that is also access
ecommerce login -> shopping cart etc visible to only user, financial information
discretionary vs mandatory
discretionary -> you have control over who you share with, forwarding mails, changing file access modes etc possible
mandatory -> decisions made by centralized management - users cannot even share information without permission, typically in
high security (like military) systems
Role is like a class, applicable to multiple users having separate unique ids
eg. -> student details access should be given to HOD role, and current HOD will be given HOD role, and removed when changed
single user can have multiple roles -> hod,teacher, cultural advisor, etc
hierarchies or superset :
rules of roles:
policies vs permissions
permissions: static rules usually based on simple checks, example group based
policies: more complex conditions possible, combine multiple policies
example:
bank employee can view ledger entries
ledger access only after 8am on working days
priviledge escalation
change user or gain an atrribute -> sudo or su
usually combined with explicit logging, extra safety measures, etc
recommended -> do not sudo unless absolutely necessary
never operate as root in a linux/unix environment unless absolutely necessary
never use su (change user), if needed priviledge, use sudo (same user, more priviledge)
web apps
admin dashboards, user access, etc
enforcing
hardware level -> security key, hardware token for access, locked doors, etc
operating system -> filesystem access, memory segmentation
application level -> db server can restrict access to specific db
web application -> controllers enforce restrictions, decorators in python used in frameworks like flask
notes
discretionary
security mechanisms
obsurity (bad idea)
application listens on non-standard port known only to specific people
address
where are you coming from? host based access/deny controls
login
username/password provided to each person needing acces (never store password directly)
tokens
access tokens that are difficult/impossible to duplicate
can be used for machine-to-machine authentication without passwords
HTTP authentication
basic http authentication
- enforced by server
- server returns 401/unauthorized code to client (not 404-not found, 403 forbidden(no option to authenticate)
- client must respond with access token as an extra HEADER in next request
client certificates
cryptographically secure certificates provided to each client
client does handshake with server to exchange information, prove knowledge
keep cert secure on client end -> impossible to reverse and find the key
form input
username and password entered into form
transmitted over link to server (link must be secure HTTPS)
GET requests: URL encoded data : very insecure, open to spoofing
POST requests: form multipart data: slightly more secure, still needs secure link to avoid data leakage
cookies
server checks some client credentials, then 'sets a cookie' ( random number, not possible to guess )
header ->
API security
cookies etc require interactive use (browser)
basic auth-pop up window not possible in API
API typically accessed by machine clients or other applications
commandline etc possible, but not used
use token or API key for access
subject to same restrictions: HTTPS, not part of URL, etc
sessions
session management
client sends multiple requests to server
save some state information -> logged in, choice of bg color, etc
server customizes responses based on client session information
storage: client-side session (stored in cookie) and server-side session (stored on server, looked up from cookie)
cookies
set by server with set-cookie header
must be returned by client with each request
can be used to store information: theme, bg color, font size (no security issues), user permissions, username can also be set in
cookie but must not be possible to alter
security issues:
user can modify cookie
if someone else gets cookie they can login -> remedy: timeout, source IP, etc
cross-site requests
attacker can create page to automatically submit request to another site, if user is logged in on other site wheen they visit
attack page, with automatically invoke action. verify on server that request came from legitimate start point
enforce authentication -
some parts of site must be protected
enforce existence of specific token for access to those views
views: determined by controller
protect access to controller: flask controller -> python function
protect function -> add wrapper around it to check auth status (decorator)
Notes
SESSION COOKIES VS PERSISTENT COOKIES:
Session cookies do not retain any information on your device or send information from your device. These cookies are
deleted when the session expires or is terminated when the browser window is closed. Persistent cookies remain on the
device until you erase them or they expire.
HTTPS
normal HTTP process
open connection to srever on fixed network port (80)
transmit HTTP request
recieve HTTP response
can be tapped
can be altered
secure sockets
set up an 'encrypted' channel between client and server
need a shared secret -> eg long binary string (key)
XOR all input data with key to generate new binary data
attacker without key cannot derive actual data
how to set up shared secret?
must assume anything on the wire can be tapped
what about pre-existing key?
secure side channel - send a token by post, SMS, etc
types of security
channel (wire) security -> ensure that no one can tap the channel, most basic need for other auth mechanisms, etc
server authentication -> how do we know that we are actually connecting to correct server and not some other server, DNS
hijacking possible, server certificates help. common root of trust needed - someone who 'vouches for' that server's authenticity
client certificate -> rare but useful - server can require client certificate. used especially in corporate intranets.
https certificates
chain of trust - A issues to B, B issues to C, etc. if you trust a node, you trust all its descendants
potential problems:
old browsers not updated with new chains of trust
stolen certificates at root of trust : certificate revocation, invalidation possible, need to ensure OS can update trust stores
DNS hijacking -> give false IP for server as well as entries along chain of trust, but certificate in OS will fail against
eventual root of trust
impacts of HTTPs
security against wiretaps
better in public wifi networks
negatives:
logging
Record all accesses to app to:
record bugs
number of visits, usage patterns
most popular links
site optimization
security checks
done by:
server logging
built into apache, nginx, etc
just accessses and URL accessed
can indicate possible security attacks -> large number of requests in short duration, requests with malformed URLs, repeated
requests to unused endpoints
log rotation
high volume logs -> mostly written, less analysis
cannot store indefinitely -> delete old entries
rotation -> keep last N files, delete oldest file, rename log.i to log.i+1. fixed space used on server.
application testing
why testing?
does something work as intended
requirements - specifications
responds correctly to inputs
respond within reasonable time
installation and environment
usability and correctness
white-box testing
detailed knowledge of implementation
can examine internal variables, counters
tests can be created based on knowledge of internal structure
pros:
more detailed information available -> better tests
cons:
can lead to focusing on less important parts because code is known
does not encourage clean abstraction (separation of concerns)
too much information
regression testing
maintain series of tests starting from basic development of code, each test is for some specific feature or set of features
regression - loss of functionality introduced by some change in code.
future modifications to code should not break existing code
sometimes necessary -> update tests, update API versions, etc
better to automate tests
coverage
how much of the code is covered by tests
Notes
testing has two parts, verification and validation
verification: verify that code does what its supposed to do
validation: validate that application is aligned to requirements
levels of testing
who are stakeholders? -> client, etc
functionality -> each stakeholders have different needs
non-functional requirements -> page color, font, etc
requirement gathering
extensice discussions with end-users required
avoid language ambiguity
capture use cases and examples
start thinking about test-cases and how the requirements will be validated
units of implementation
break functional requirements down to small, implementable units
each one may become a single controller
unit testing
test each individual unit of implementation
may be single controllers -> may even be part of a controller
clearly define inputs and expected outputs
testable in isolation? -> can each unit be tested without the entire system?
create artificial data set to check whether a single update works
integration testing
application consistes of multiple modules, each module(unit) works as verified by unit tests
does the units work together? that is integration testing
continuous integration (CI)
combined with version control systems CI
each commit to main branch triggers a re-evaluation of integration of integration tests
multiple times a day possible
test generation
API based testing
api -> abstraction for system design
standard representation for APIs, openAPI, swagger
they can also generate testcases like swagger inspector
use cases
import api definitions from standard like openapi
generate tests for specific endpoints, scenarios
record API traffic
inject possible problem cases based on known techniques
data validation tests
abstract tests
semiformal verbal description (example:)
make a request to '/' endpoint
ensure that result contains the text 'hello world'
executable test:
UI testing
user interface -> visual output
usually GUI -> even for web-based system
but specific details of graphical display may be different in web-based systems
tests:
are specific elements present on page
are navigation links present
what happens on random click on some part of the page
browser automation
some tests cannot be directly run programmatically
browser is required, just requests not sufficient
request generation -> python requests library, capybara (ruby)
direct browser automation -> selenium framework -> actually instantiate a browser
examples -> selenium, katalon, cucumber
security testing
generate invalid inputs to test app behaviour
try to crash server -> overload, injections, etc
black-box or whitebox approaches
fuzzing or fuzz-testing -> generate large number of random/semi-random inputs
pytest
opinionated -> provides several defaults to make it easier to write tests
helpful features -> can automatically set up env, tear down, text fixtures, monkeypatching, etc
python standard library includes unittest
pytest is an alternative with some more features
text fixtures
set up some data before test
remove after test
example -> initialise dummy database, create dummy users, files
beyond HTML
-- subjective
HTML evolution
originas from late 60s, mostly used for typesetting and document management systems
lack of standardization, target audience was not sure
target output was different
machine readability
SGML
standard generalized markup language
meant to be a base from which any ML could be designed
basic postulates -> declarative (specifu structure and attributes, not how to process them) and rigourous (strict definition of
structure, like databases)
DTD - Document Type Definition -> used to specify different family within this umbrella each could have its own tags,
interpretations
SGML Applications
SGML was too complex
HTML
originally intented to be an application of SGML
very linient with parsing, meant to be forgiving of errors (not SGML)
HTML 2.0 attempt to become SGML complaint
legacy support -> not truly SGML complaint
HTML4 official definition -> true SGML application (limited usage)
HTML5 -> not an SGML application -> defines its own parsing rules
XML
extensible markup language
based on SGML
custom tags - multiple appliocations defined
focus on simplicity, generality, usability
both human and machine readible
well structured -> can be used to represent complex data relationships, data structures, etc
examples -> mathML, RSS, Atom, SVG
XHTML
based on XML - not directly SGML
reformulatikon of HTML4 as application of XML
main goal is to clean up HTML specification -> modular and more extensible
XML Namespaces -> allow interoperablity with other XML applications
HTML5
add support for latest features -> multimedia support, canvases, etc
remain easily readable and understandable to both human and machine
remain backwards compatible
break away from SGML -> not SGML or XML
define its own parser
HTML5 is last version of HTML
HTML Living Standard maintained by WHATWG split away from W3C
Extensions
how to add new features, new tags
software defined -> allow new tags to be added through javascript
custom elements -> api supported by browsers
very powerful mechanism -> arbitrary functionality possible -> no new tags need to be brought into standard
potential problems -> anyone can define a tag? semantics of tags may not be well thought of
requirements -> javascript
javascript
high level programming language -> dynamic typing, object orientation (prototype based)
multi paradigm
event driven
functional -> compositon of functions, functions as objects
imperative -> direct computation through procedures and functions
relatively easy to learn -> similar to python, c, java
most web browsers have a dedicated JS enginge
APIs ->
text, dates, regex
standard data structures (dictionaries)
docmuent object model - manipulate the page
no native IO (no file access etc) but provided through APIs
most power when used for DOM manipulation
custom elements
custom elements API (read more online documentation)
use JS classes and inheritence and overrdiing to define custom tag behaviour
web components
custom elements is JS API to create custom element tags
shadow DOM -> API to keep styling of components separate from rest of page
HMTL Templates -> <templates> and <slot> tags to write markup templates
frameworks
purpose of frameworks
basic functionality already available
python can create network listeneres, manipulate strings, etc
js can extend elements, use API to manipulate DOM etc
problem:
lots of code repetition - boilerplate
reinventing of the wheel - different coding styles, techniques
Solution:
standard techniques for common problems - design patterns
frameworks: flask for python, react for js etc
SPA : single page applications -> many JS front end frameworks focus on enabling this
React
library for bulding UI
declarative -> opposed to imperative, specify what is needed, not how to do it
components ->
different from WebComponents - similar ideas, different techniques
webcomponents are imperative: functions that specify behaviour
react is declarative -> focus on UI but allow composing views
deployment
app components
developing an app
idea
local development
file system
editors, desktop, documents, file management
single computer
multiple services
web server
database server
permanent deployment
permanent deployment
dedicated servers
always on internet
uninterrupted power
infrastructure needed -> data centers
cloud (iaas, paas, saas)
scaling
more infrastructure
easy to scale up if using cloud services
https, load balancer
logging server
many frontends
many backends
CDNs
service approach
SaaS
IaaS
PaaS
what is it
specialisation
data center operators specialise in infra
developers focus on app dev
standard software deployments
software as a service
google docs, spreadsheets, office 365, drupal, wordpress, trello, redmine, etc
hosted solutions -> all software is installed and maintained by someone else
infra as service
raw machines or VM taken care of
power, networking taken care of
install your own OS
VPS
eg-> AWS, google compute engine, azure, digital ocean, linode
platform as service
combination of hardware and software
specific hardware req
specific software req
custom application code (flask, ROR, laravel, etc)
provider take care of power, network, infra, OS, security, base application platform, security updates, databases
developer needs to manage application code and specify requirements on server sizing, database, connectivity
scaling -> combined inputs from developer and provider
example: replit, glitch, GAE, heroku
deployment
version control
manage changes to code
retain backup of old code
develop new features
fix bugs
types
centralised -> central server, many clients. push changes to server each time, multiple editors, lock files, merge
distributed -> can have central server but not needed. changes managed using 'patches' - email, merge requests etc
github, gitlab, etc
centralised on top of distributed
friendly interfaces
worth learning commandline
continuous integration
integrate with version control
multiple authors contribute to different parts of code
central build server automatically compiles/builds code
best practises
test driven development -> write tests before code
code review -> pull and merge requests, enabled by web interfaces. review code for correctness, cleanliness, etc
integration pipeline optimization -> tests run on each push to server, can be several times a day. fast runs, optimised based on
changes ,etc
Continuous Delivery/Deployment
CI/CD - part of DevOps pipeline
CI = continuous integration
CD could be Continuois Delivery or Deployment
continuous delivery :
once CI has passed, package files for releasse
automated delivery of release package on each successful test
nightly builds, beta testing, up to date code version
continuous deployment:
extend beyond delivery - deploy to production
passed tests -> deployed to users
users see latest version that has passed tests
no installing new versions/upadting code on servers
benefits
immediate fixes, upgrades
latest features deployed immediately
drawbacks:
tests may not catch all problems
containers
what
self contained env with OS and min libraries
primarily used with linux kernel namespaces, others like chroot possible
why
full OS imposible to version control - too much software
create self-contained images that can be version controlled
sandboxing - image cannot affect other processes on system
how
kernel level support needed
all communication inter-container networking
history
chroot - custom filesystem for part of the code. no real process isolation
freeBSD jails, linux VServer, OpenVZ -> containers in linux, same kernel. different filesystesm
control group namespaces (cgroups) -> linux kernel 2008 -> process isolation through namespaces
docker -> mechanism for managing images, popularized containers, problems: bad practises, version control hard
orchestration
app consists of multiple processess not just one
start in some specific order (dependencies)
communicate between processes that are isloated (network)
mechanisms to build and orchestrate, automate
docker-compose
kubernetes
key to understanding and managing large scale deployments