Building an AI generated game with Stable Diffusion and data from Wikipedia

taylor-vick-M5tzZtFCOfs-unsplash

Last week I released a game called Doodle:ai.

In the game you're shown AI generated images and you have to guess the Wikipedia topic it used to create the game.

Doodle:ai game interface

Doodle:ai has had thousands of players since launch and I thought I would share how I built it.

The idea

There has been incredible progress in generative AI models over the last 3 months and I wanted to build something with it.

I had this idea to automatically create Google Doodles based on Wikipedia topics.

As I was building this Google Doodle generator tool, I thought it might be fun to see if I could randomise the output a little and guess what the AI had chosen for a day.

It worked as a game so I took that direction.

The Architecture

Architecture

The game runs off pre-generated puzzle data.

The backend is a NestJS server which exposes endpoints to queue up puzzle generation jobs and store the output in a Digital Ocean Spaces blob store.

The frontend is a React app which uses the Digital Ocean Spaces store to get the daily puzzle.

The high level process for generating a puzzle

  1. An administrator (Me) queues up a puzzle generation job for a date range. These jobs get put in a Bull queue by a NestJS controller.
  2. The Bull queue workers retrieve and process daily topics from Wikipedia.
  3. The topics are filtered and cleaned up to remove topics that are not suitable for a game.
  4. A random topic is chosen and the worker requests the summaries for Wikipedia links and articles relevant to the topic.
  5. The summaries are combined into an essay and sent to Azure NLP for analysis.
  6. The NLP analysis is filtered and mapped to useful combinations for prompt generation.
  7. Lots of prompts are generated and 4 random prompts are chosen.
  8. The prompts are sent to stable diffusion to generate images.
  9. The images and puzzle data are packaged and sent to Digital Ocean Spaces.
  10. A customer opens the game and the puzzle data is retrieved from Digital Ocean Spaces.

Topic selection

Topic selection

The first step is to select a topic for the game. Wikipedia is a relatively high quality source of well organised trivia.

I use the Wikipedia API to get a list of topics for a given day. I then filter the topics to remove topics that are not suitable for a game.

I originally accessed the wikimedia api directly. The first few games used this data.

But Wikitext is extremely difficult to parse so I have changed to use the wtfWikipedia node library. This returns a structured javascript object and is much easier to work with.

See an example of wikitext below. It's so hard to parse, thankfully we have open source parsers like wtfWikipedia.

{{Short description|Capital city of Ontario, Canada}}
{{About|the city in Ontario}}
{{Redirect|City of Toronto|the city's government|Municipal government of Toronto}}
{{Pp-move-indef}}
{{Pp-vandalism|small=yes}}
{{Use Canadian English|date=July 2014}}
{{Use mdy dates|date=April 2021}}
{{Infobox settlement
| name                     = Toronto
| official_name            = City of Toronto
| settlement_type          = [[List of cities in Ontario|City]] ([[List of municipalities in Ontario#Single-tier municipalities|single-tier]]){/* Consensus see: [[Wikipedia talk:WikiProject Ontario/Archive 1#City infoboxes: "tier" or "conventional" municipal statuses (or both)?]] */}

Once the Wikipedia pages are parsed, I filter out topics that are not suitable for a game. This is generally topics that I feel will result in images that are too visceral or offensive like mass shootings or terrorist attacks.

The final step is simply choosing a random topic as the correct answer for the game.

Entity Categorisation

Entity Categorisation

The game requires keywords for guesses and keywords for image generation prompts. To create these keywords I build an "essay" from the topic content and any adjacent Wikipedia topics.

  public async getSummariesOfPages(pageNames: string[]): Promise<string[]> {
        const summaries = (await wtf.fetch(pageNames, {
            lang: "en",
            // eslint-disable-next-line @typescript-eslint/naming-convention
            follow_redirects: true,
        })) as WtfWikiDocument[];

        const parsedWikiPages = summaries && summaries.map((x) => x.json());

        this.logger.log(
            "summaries",
            util.inspect(parsedWikiPages[0], false, 8, true)
        );
  // continue to parse and filter the data
  // ...
  // ...
  }

This essay is sent to azure NLP which returns a list of detected entities. This entity list is passed to the next stage.

const client = new TextAnalyticsClient(
  this.config.url,
  new AzureKeyCredential(this.config.apiKey)
);

const results = await client.recognizeEntities([text]);
this.logger.log("Text analysis results", results);

// map the results to a list of entities
// ...

Prompt Creation

The difficult part of using AI generation tools is creating prompts.

I have gone through four major refactors of my prompt generation process. I have to provide spoilers for these games to explain the prompts. Sorry about that!

Play all the games first before reading this if you prefer.

1. Simply add all categorised entity keywords to 4 fixed prompts

This method resulted in some very interesting images but maybe 1/3 were usable for the game and they were all quite similar.

Example: "watercolor, 1995 gulf of aqaba earthquake, mercalli intensity scale, gulf, of aqaba earthquake, sinai peninsula, saudi arabia, tsunami"

game image

Play this game here: https://doodleai.darraghoriordan.com/?gamedate=2022-11-22T12:00:00.000Z

2. Split entities randomly across a few random prompts

I tried randomising the keywords and selecting the first half for one prompt and the second half for another prompt and so on. This was actually OK but there were still issues where prompts that were suitable for portraits were used for landscapes and vice versa.

Example: "denny party, seattle,by Josef Thomas, matte painting, trending on artstation HQ"

game image (perfect because the tower is seattle)

Play this game here: https://doodleai.darraghoriordan.com/?gamedate=2022-11-13T12:00:00.000Z

3. Select the entities with best confidence scores and randomly join them

I tried only selecting the top 4 entity confidence scores. These were OK but categorisation confidence is not the same as topic relevance, so this often missed good, relevant keywords.

Example: "Donald Johanson, Australopithecus afarensis, Lucy (Australopithecus), The Beatles, depth of field, photograph, Polaroid 600 Instant Film"

game image (creeeeeeepy!)

Play this game here: https://doodleai.darraghoriordan.com/?gamedate=2022-11-24T12:00:00.000Z

4. Split entities in set patterns, use specific lists of prompts suitable for the category

This is the current method I use. It results in mostly relevant images and a good variety of images.

Example: "a hyper realistic professional photographic view picture of Richard Seddon, New Zealand photographic filter unreal engine 5 realistic hyperdetailed 8k ultradetail cinematic concept art volumetric lighting, fantasy artwork, very beautiful scenery, very realistic painting effect, hd, hdr, cinematic 4k wallpaper, 8k, ultra detailed, high resolution, artstation trending on artstation in the style of Albert Dros glowing rich colors powerful imagery nasa footage drone footage drone photography"