HTTP is the protocol that runs the web, it’s the way most devices talk to each other these days. It’s also the protocol we are going to use, and as such we need to get everything setup for the http protocol. For this series we are taking an object-oriented aproach, so we will want to create the
Response objects to house our info. This will help us model out the HTTP protocol, so we can do everything else we need to later.
Terms & Concepts
To start there are some terms we need to understand. Here are some of the basics:
|IP Address||This is what computers use to communicate with one another, it is similar to an actual address|
|Port||A computer can have multiple servers running on it. Each of these “servers” runs on a different port so you can communicate with them sparately||If you are running a server on your own machine and have it on port 8080 you can access it in a browser at |
|Proxy/alias||In our context we’re using this to mean something that stands-in or is replaced by something else||For |
|Host||The name of a computer (local/on the machine)||Let’s say you gave your PC the name |
Anatomy of a URL
Some of the most important terms we need to understand are what makes up a URL. A URL is what we type into a browser bar to get to a site (i.e. https://schulichignite.com). Here is a diagram breaking apart the different peices:
For those of you that can’t read the image, this is the basic anatomy of a URL:
For example here are some URL’s:
http://127.0.0.1:80/ http://localhost:8000/ http://thoth:32400/web/index http://schulichignite.com/contact http://example-site.co.uk/about-us
|Protocol||The protocol is the rules and procedures that the browser should follow to communicate with the underlying server|
|Domain||The name of a computer (remote/on a server accessible to the internet)||When you go to |
|Domain Name||The domain name is the main portion of the domain, before the ||google, netflix, schulichignite|
|TLD (top-level domain)||These groupings of domains used to serve more of a roll before, but now are mostly all the same. They are the letters that come after a |
|Slug/URI||The part that comes after a domain to indicate which specific resource you want from the specified server on the specified port|
How HTTP Works
Let’s look at the theory of how HTTP works, and the steps made in the protocol.
Request <–> response
HTTP is a request to response protocol. This means the client will create a request that is sent to the host server which is then processed and a response is sent. You can think of this like mailing a letter to where you add in the contact details, send a letter and recieve a response:
Requests must have:
- A method
- A hostname
- A slug
- Content (optional)
Responses must have:
- A status code
- Content (optional)
Anatomy of a request/response
All requests/responses follow the same format, they are plaintext and include a starting line (contain different info for requests/reponses), headers, and then an empty line and the content:
STARTING LINE HERE HEADERS CONTENT
The starting line for requests contains the Method, HTTP version and a slug. The starting line for a response includes the status code.
Headers are key-value pairs of data that can be passed on both requests and responses.
- Change the way data is processed
- Include additional data relevant to be processed
- Include additional data about the client/host to help with debugging and identification/authentication
So for example let’s say you’re sending a response and
A MIME type acts similarly to the way that extensions work on files. MIME types tell let the client/host know what type of content is being included in the request/response.
For example let’s say you’re sending a
.css file across the network. You would include the header
text/css to tell the browser to interpret the file as a CSS file that should be processed by the browser, if you changed the MIME type to
text/plain then the browser will just show you the plain text of the CSS file. Essentially the extension is irrelevent most of the time, and the MIME type will take precidence over the extension. If no MIME type is provided some browsers will interpret everything as text, others will interpret them as a file to download, and others will try to just process certain files based on extension.
Requests are used for a client to send a message to a host, typically this is the initialization of the HTTP communication, though some features of HTTP will require a HTTP Host to send requests to other servers to complete.
As mentioned earlier the starting line of an HTTP request includes the method and HTTP version. So for example these are all valid HTTP requests:
GET / HTTP/1.1 HOST: google.ca
With more headers and still no content
GET /about HTTP/1.1 Host: kieranwood.ca Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7 Accept-Encoding:gzip, deflate, br Accept-Language:en-US,en;q=0.9 Cache-Control:max-age=0 Sec-Ch-Ua-Platform:"Windows" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/220.127.116.11 Safari/537.36 Edg/114.0.1823.58
With content and a POST request
POST /contact HTTP/1.1 Host: schulichignite.com Content-Type: application/x-www-form-urlencoded Content-Length: 27 [email protected]&name=Kieran
I talked about methods in the last section. There are several Methods with HTTP requests, these methods tell the server what you’re intending to do with your request:
|GET||Used to get a resource||You request google.com to GET the webpage associated with that URL|
|POST||Used to send information to a server for creation||You POST the results of a form you filled out to a submission URL|
|PUT||Used to update existing information on a server (often POST is used in place of PUT for creation and updating)||You PUT an updated record of your favourite movies on a site that had an old list already on it|
|DELETE||You’re requesting the resource at the URL is deleted||You request to DELETE an image off your instagram account|
Responses are typically generated by the host as a response to a request, though sometimes the host will request some additional information from the client which will then send a response. HTTP responses are some of the hardest to parse because oficially the only thing they have to have is a status code, they don’t need headers or content to be considered “valid”.
As mentioned earlier the starting line of an HTTP request includes the status code. So for example these are all valid HTTP responses:
Response with no content or headers
Response with no content, headers or status description
Response with a status code and headers
200 OK Host: schulichignite.com Server: HHTTPP
Response with a status code, content and headers
200 OK Host: schulichignite.com Server: HHTTPP Content-Type: text/html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Example Site</title> </head> <body> <h1>Hello World!</h1> </body> </html>
Status codes are used to include information about how the processing of a request went. These are broken up into groups by the 100 up to 599. Each group’s starting number indicates what “type” of code it is. Additionally all official codes have a “description”, this is optional to be returned:
|Range||Description of group||Example|
|100-199||Informational responses; Gives you some early information before a response is fully ready|
|200-299||Successful responses; The request had a successful result|
|300-399||Redirection Messages; The content you want is somewhere else, this response will tell you where to look|
|400-499||Client Error Message; Something went wrong on the client end|
|500-599||Server Error Responses; Something went wrong on the hosts end|
People do create their own custom status codes (ahem), and also change the status descriptions often. It is not recommended to do this, if your custom code gets used for something down the road then your service will be broken!
Let’s look at existing sites
We can see how this works on real websites. Requests and responses are sent publicly so they can be inspected, and you can even manually construct requests.
The easiest way to do this is right in your browser. If you are using a chromium based browser (Chrome, opera, edge etc. and also firefox has this functionality) you can use the developer tools to see the headers.
For me I need to open the developer tools and then head to the network tab.
Now that I have it open I can go to https://schulichignite.com and will see the network requests come in. There are a bunch of requests and responses. The one we care about for now is one that has the type of
document (this is the initial request response):
We can now inspect this request directly by double clicking on it:
- Select method and URL
- Modify headers and Body
- Send the request
- See the raw form of your request
- See your response
There is also a CLI version of httpie, but a much more robust CLI alternative is
Curl is a common command line tool included in most linux distributions. This tool will allow you to do waaay more than just create requests and see responses, and it’s worth looking into yourself. But for the absolute basics if you want to send a simple GET request you can do it with:
You can then add headers by adding a
-H flag with the text:
curl https://schulichignite.com -H "Host: schulichignite.com"
You can then change the method using
curl -X DELETE https://schulichignite.com/image/my-image.png
How we’re going to implement all this
As mentioned earlier our process will be to create 3 classes whichwill represent all of our required HTTP interaction, these will be added to in later posts to add additional functionality. Here are the required features for the 3 classes:
Servers must be able to:
- Parse a request into a
Requestobjects and generate
Responseobjects into plaintext/binary responses
Requests to genera into plaintext/binary requests to send to other servers (we won’t use this, but it’s handy)
Requests must have:
- A method (
- A hostname
- A slug
- Additional Headers (optional)
- Content (optional)
Responses must have:
- A status code
- Headers (optional)
- Content (optional)
On top of these there are a few “helper” objects that will be stored inside these objects:
- Used to store status codes and their corresponding descriptions with some data validation
- A class used to store the MIME type and path to the resource
- Has a function that takes the path to a resource and generates a valid MIMEType for that file
Now that we’ve know the features we need to go through and write tests so we know they work properly and then write the features to make the tests work. We’re going to do this using pytest. This is a testing framework that will help us check if our code is working correctly. Specifically we are going to write tests for how we want the system to work, and then edit the code until the tests pass.
This is called test driven development (TDD), and it’s a common practice in software development. This doesn’t work well for every type of application, but it’s handy for protocols where there are clear rules that the system should abide by. Since we know what should happen we know what tests to write, usually the biggest problem of TDD is we don’t know exactly what functionality we need, which makes writing tests hard. Since we know the protocol well enough we know what it should do and so the tests can accurately tells us if we make a mistake in our code.
The tests can all be found inside the
/tests folder for each post.
To get started run:
pip install -e .; Should be run in the main directory (where
setup.pyis). This installs
hhttppas an editable package, meaning changes to the folder are automatically updated without needing to be reinstalled
pytest; Once you have the project installed you can run the command in the main directory (where
setup.pyis) to run the tests
From there within the source code our structure is basically to have 1 long test function per class, which tests that classes functionality.
Our TDD Approach
We are going to create these objects by first defining some tests that say how we want users to use our code when it’s done. For example we can write some tests to make sure we can create status codes, that it works at all edge cases, and that is errors if we go out of the range of allowed status codes (100-599):
from pytest import raises from hhttpp.classes import * def test_status_codes(): # Correct cases StatusCode(101,"Switching Protocols") StatusCode(200, "Ok") StatusCode(301, "Moved Permenantly") StatusCode(404, "Not Found") StatusCode(500, "Internal Server Error") # Edge cases StatusCode(100, "Continue") StatusCode(599, "") # Error case with raises(ValueError): StatusCode(-1, "Broken") with raises(ValueError): StatusCode(99, "Switcheroonied") with raises(ValueError): StatusCode(600, "Uknown Browser")
So we can create these sorts of tests for each of these classes. So for example if we wanted to have a method on the
Server object called
Server.parse_request(request:str)->Request, we can pre-create a test and put it in the file that will fail until we create the correct method:
@dataclass class Server: ... # More attributes and methods here removed for simplicity def parse_request(request:str)->Request: ...
from pytest import raises from hhttpp.classes import * def test_server(): s = Server() raw_request = "GET / HTTP/2\nHost: schulichignite.com" req = s.parse_request(raw_request) assert type(req) == Request # Will fail until a request object is returned
From here then when we eventually fill out the function (in this case we’re going to hardcode a result and we’ll do the parsing in the next post):
@dataclass class Server: ... # More attributes and methods here removed for simplicity def parse_request(request:str) -> Request: # TODO: Parse input text to generate Request object result = Request("schulichignite.com", "/", "GET") if len(self.logs) >= self.log_limit: print("500 or more, popping value") self.logs.pop() self.logs.append(result) return result
This approach of creating tests that expect the correct result, and then making our project around these tests is how I will build out the rest of this system.
Essentially code wise we will have a single
Server object. This server object will process incoming requests into
Request objects, parse them, then generate a
Response object and send back the correctly formatted response:
Some of the objects have additional features beyond the required HTTP features
- Error on 4xx/5xx optional
- Logs (storing requests and responses sent/recieved)
- Ability to specify which directory to proxy
- is_binary flag for responses that contain binary content (i.e. images, pdf’s, exe’s etc.)
- A server header that defaults to
is_errorflag for responses that are errors (4xx/5xx)