A nightmare with input and output tokens

What are they, how to find and use them, especially for AWS Bedrock 🕵️

April 24, 20243 min read

What are tokens? 😵

Tokens are basically the language that Large Language Models (LLMs) understand and speak. The input we make, e.g. a word like "tomato" with 6 characters will be translated into 2 tokens for OpenAi:

The number of tokens can vary across different LLMs. 😵‍💫

Input and output tokens? 👀

Input tokens on the one hand are what is sent to the LLM - your prompt message.

Output tokens on the other hand are what you get back from the LLM - the LLM response to your prompt.

You can influence the length of the LLM response by setting an upper limit for the output tokens, called max tokens - e.g the shorter the LLM repsonse should be, the less amount of max tokens should be set.

It's important to note that input and max tokens for the output together should not exceed the context length of an LLM - the LLM memory, which varies by LLM, e.g. 4096 tokens for newer LLMs

How to find those tokens? 👀

It's quite LLM-specific and also depends on the endpoints that are triggered 😵‍💫 - either chat or text or else.

The input tokens and output tokens can usually be found in the response payload, like so:

With AWS Bedrock, one would get the information about input and output tokens in the response header, like so:

However, for some LLMs like for Jurassic from AI21 Labs, the information about input and output tokens and only be found in the logs. 🤪

Logging would need to be activated, then triggered by API Requests or by Software Development Kit (SDK) calls, and then the right logs can be found for the corresponding API Request or SDK call made by looking up the log event that belongs to the Request ID.

How to use those tokens?

Whenever LLM calls are made, it would be a good practice to track how much cost they have accrued.

The calculation can be done like so:

Now there is lots of traffic in the head - time to call it a day 💀