sdxl benchmark. Within those channels, you can use the follow message structure to enter your prompt: /dream prompt: *enter prompt here*. sdxl benchmark

 
 Within those channels, you can use the follow message structure to enter your prompt: /dream prompt: *enter prompt here*sdxl benchmark  I figure from the related PR that you have to use --no-half-vae (would be nice to mention this in the changelog!)

Let's dive into the details. There are slight discrepancies between the output of SDXL-VAE-FP16-Fix and SDXL-VAE, but the decoded images should be close. 5 has developed to a quite mature stage, and it is unlikely to have a significant performance improvement. You can also fine-tune some settings in the Nvidia control panel, make sure that everything is set in maximum performance mode. Stable Diffusion XL (SDXL) was proposed in SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. 10 in series: ≈ 10 seconds. We’ve tested it against various other models, and the results are. Aug 30, 2023 • 3 min read. For awhile it deserved to be, but AUTO1111 severely shat the bed, in terms of performance in version 1. r/StableDiffusion. You can use Stable Diffusion locally with a smaller VRAM, but you have to set the image resolution output to pretty small (400px x 400px) and use additional parameters to counter the low VRAM. • 25 days ago. We have merged the highly anticipated Diffusers pipeline, including support for the SD-XL model, into SD. 4090 Performance with Stable Diffusion (AUTOMATIC1111) Having issues with this, having done a reinstall of Automatic's branch I was only getting between 4-5it/s using the base settings (Euler a, 20 Steps, 512x512) on a Batch of 5, about a third of what a 3080Ti can reach with --xformers. To stay compatible with other implementations we use the same numbering where 1 is the default behaviour and 2 skips 1 layer. arrow_forward. I believe that the best possible and even "better" alternative is Vlad's SD Next. Same reason GPT4 is so much better than GPT3. . Nvidia isn't pushing it because it doesn't make a large difference today. 70. Animate Your Personalized Text-to-Image Diffusion Models with SDXL and LCM Updated 3 days, 20 hours ago 129 runs petebrooks / abba-8bit-dancing-queenIn addition to this, with the release of SDXL, StabilityAI have confirmed that they expect LoRA's to be the most popular way of enhancing images on top of the SDXL v1. Following up from our Whisper-large-v2 benchmark, we recently benchmarked Stable Diffusion XL (SDXL) on consumer GPUs. py implements the InstructPix2Pix training procedure while being faithful to the original implementation we have only tested it on a small-scale. 5 - Nearly 40% faster than Easy Diffusion v2. Running TensorFlow Stable Diffusion on Intel® Arc™ GPUs. Next supports two main backends: Original and Diffusers which can be switched on-the-fly: Original: Based on LDM reference implementation and significantly expanded on by A1111. Here is one 1024x1024 benchmark, hopefully it will be of some use. comparative study. I'd recommend 8+ GB of VRAM, however, if you have less than that you can lower the performance settings inside of the settings!Free Global Payroll designed for tech teams. AMD, Ultra, High, Medium & Memory Scaling r/soccer • Bruno Fernandes: "He [Nicolas Pépé] had some bad games and everyone was saying, ‘He still has to adapt’ [to the Premier League], but when Bruno was having a bad game, it was just because he was moaning or not focused on the game. 0: Guidance, Schedulers, and. 5 and 2. At 769 SDXL images per dollar, consumer GPUs on Salad’s distributed cloud are still the best bang for your buck for AI image generation, even when enabling no optimizations on Salad and all optimizations on AWS. 2. OS= Windows. 5: SD v2. 1440p resolution: RTX 4090 is 145% faster than GTX 1080 Ti. SD1. Performance benchmarks have already shown that the NVIDIA TensorRT-optimized model outperforms the baseline (non-optimized) model on A10, A100, and. As the title says, training lora for sdxl on 4090 is painfully slow. 19it/s (after initial generation). I'm aware we're still on 0. Insanely low performance on a RTX 4080. Description: SDXL is a latent diffusion model for text-to-image synthesis. I was expecting performance to be poorer, but not by. Download the stable release. 4070 solely for the Ada architecture. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. This suggests the need for additional quantitative performance scores, specifically for text-to-image foundation models. 5: Options: Inputs are the prompt, positive, and negative terms. First, let’s start with a simple art composition using default parameters to. 10 k+. In this SDXL benchmark, we generated 60. SDXL 1. Further optimizations, such as the introduction of 8-bit precision, are expected to further boost both speed and accessibility. The chart above evaluates user preference for SDXL (with and without refinement) over Stable Diffusion 1. when you increase SDXL's training resolution to 1024px, it then consumes 74GiB of VRAM. scaling down weights and biases within the network. The latest result of this work was the release of SDXL, a very advanced latent diffusion model designed for text-to-image synthesis. AMD RX 6600 XT SD1. arrow_forward. sdxl runs slower than 1. Everything is. The performance data was collected using the benchmark branch of the Diffusers app; Swift code is not fully optimized, introducing up to ~10% overhead unrelated to Core ML model execution. PugetBench for Stable Diffusion 0. Learn how to use Stable Diffusion SDXL 1. Next. The way the other cards scale in price and performance with the last gen 3xxx cards makes those owners really question their upgrades. Currently ROCm is just a little bit faster than CPU on SDXL, but it will save you more RAM specially with --lowvram flag. More detailed instructions for installation and use here. With upgrades like dual text encoders and a separate refiner model, SDXL achieves significantly higher image quality and resolution. 100% free and compliant. safetensors at the end, for auto-detection when using the sdxl model. r/StableDiffusion. 1,871 followers. -. 我们也可以更全面的分析不同显卡在不同工况下的AI绘图性能对比。. First, let’s start with a simple art composition using default parameters to give our GPUs a good workout. The Collective Reliability Factor Chance of landing tails for 1 coin is 50%, 2 coins is 25%, 3. 6k hi-res images with randomized. py in the modules folder. 5 so SDXL could be seen as SD 3. Updates [08/02/2023] We released the PyPI package. 5 it/s. Installing ControlNet. Stay tuned for more exciting tutorials!HPS v2: Benchmarking Text-to-Image Generative Models. latest Nvidia drivers at time of writing. it's a bit slower, yes. So the "Win rate" (with refiner) increased from 24. On my desktop 3090 I get about 3. Starting today, the Stable Diffusion XL 1. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. First, let’s start with a simple art composition using default parameters to. 5 and 1. Horns, claws, intimidating physiques, angry faces, and many other traits are very common, but there's a lot of variation within them all. 9 is now available on the Clipdrop by Stability AI platform. For example turn on Cyberpunk 2077's built in Benchmark in the settings with unlocked framerate and no V-Sync, run a benchmark on it, screenshot + label the file, change ONLY memory clock settings, rinse and repeat. Despite its powerful output and advanced model architecture, SDXL 0. Single image: < 1 second at an average speed of ≈27. Details: A1111 uses Intel OpenVino to accelate generation speed (3 sec for 1 image), but it needs time for preparation and warming up. 1: SDXL ; 1: Stunning sunset over a futuristic city, with towering skyscrapers and flying vehicles, golden hour lighting and dramatic clouds, high detail, moody atmosphere Serving SDXL with JAX on Cloud TPU v5e with high performance and cost-efficiency is possible thanks to the combination of purpose-built TPU hardware and a software stack optimized for performance. Both are. Optimized for maximum performance to run SDXL with colab free. Radeon 5700 XT. 5 examples were added into the comparison, the way I see it so far is: SDXL is superior at fantasy/artistic and digital illustrated images. 9 are available and subject to a research license. Let's create our own SDXL LoRA! For the purpose of this guide, I am going to create a LoRA on Liam Gallagher from the band Oasis! Collect training imagesSDXL 0. SD1. 5 guidance scale, 50 inference steps Offload base pipeline to CPU, load refiner pipeline on GPU Refine image at 1024x1024, 0. XL. Yeah 8gb is too little for SDXL outside of ComfyUI. I selected 26 images of this cat from Instagram for my dataset, used the automatic tagging utility, and further edited captions to universally include "uni-cat" and "cat" using the BooruDatasetTagManager. It is important to note that while this result is statistically significant, we must also take into account the inherent biases introduced by the human element and the inherent randomness of generative models. Read the benchmark here: #stablediffusion #sdxl #benchmark #cloud # 71 2 Comments Like CommentThe realistic base model of SD1. 9 Release. The RTX 2080 Ti released at $1,199, the RTX 3090 at $1,499, and now, the RTX 4090 is $1,599. Researchers build and test a framework for achieving climate resilience across diverse fisheries. The Fooocus web UI is a simple web interface that supports image to image and control net while also being compatible with SDXL. I'm getting really low iterations per second a my RTX 4080 16GB. Too scared of a proper comparison eh. 0 aesthetic score, 2. (6) Hands are a big issue, albeit different than in earlier SD. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. 56, 4. Output resolution is higher but at close look it has a lot of artifacts anyway. The mid range price/performance of PCs hasn't improved much since I built my mine. 10 k+. SD WebUI Bechmark Data. app:stable-diffusion-webui. Best of the 10 chosen for each model/prompt. 4. MASSIVE SDXL ARTIST COMPARISON: I tried out 208 different artist names with the same subject prompt for SDXL. With further optimizations such as 8-bit precision, we. If you have custom models put them in a models/ directory where the . 44%. I am torn between cloud computing and running locally, for obvious reasons I would prefer local option as it can be budgeted for. 122. I will devote my main energy to the development of the HelloWorld SDXL. 私たちの最新モデルは、StabilityAIのSDXLモデルをベースにしていますが、いつものように、私たち独自の隠し味を大量に投入し、さらに進化させています。例えば、純正のSDXLよりも暗いシーンを生成するのがはるかに簡単です。SDXL might be able to do them a lot better but it won't be a fixed issue. When fps are not CPU bottlenecked at all, such as during GPU benchmarks, the 4090 is around 75% faster than the 3090 and 60% faster than the 3090-Ti, these figures are approximate upper bounds for in-game fps improvements. mechbasketmk3 • 7 mo. 0 and updating could break your Civitai lora's which has happened to lora's updating to SD 2. It was awesome, super excited about all the improvements that are coming! Here's a summary: SDXL is easier to tune. make the internal activation values smaller, by. Salad. 15. The images generated were of Salads in the style of famous artists/painters. In general, SDXL seems to deliver more accurate and higher quality results, especially in the area of photorealism. 5: SD v2. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. compile will make overall inference faster. Over the benchmark period, we generated more than 60k images, uploading more than 90GB of content to our S3 bucket, incurring only $79 in charges from Salad, which is far less expensive than using an A10g on AWS, and orders of magnitude cheaper than fully managed services like the Stability API. Available now on github:. 5 and 2. 8 / 2. Stable Diffusion. 5GB vram and swapping refiner too , use --medvram-sdxl flag when starting. A_Tomodachi. Dubbed SDXL v0. Yeah as predicted a while back, I don't think adoption of SDXL will be immediate or complete. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. A Big Data clone detection benchmark that consists of known true and false positive clones in a Big Data inter-project Java repository and it is shown how the. e. keep the final output the same, but. 8. The newly released Intel® Extension for TensorFlow plugin allows TF deep learning workloads to run on GPUs, including Intel® Arc™ discrete graphics. 0) stands at the forefront of this evolution. tl;dr: We use various formatting information from rich text, including font size, color, style, and footnote, to increase control of text-to-image generation. This architectural finesse and optimized training parameters position SSD-1B as a cutting-edge model in text-to-image generation. I tried SDXL in A1111, but even after updating the UI, the images take veryyyy long time and don't finish, like they stop at 99% every time. Researchers build and test a framework for achieving climate resilience across diverse fisheries. Stable Diffusion web UI. Next needs to be in Diffusers mode, not Original, select it from the Backend radio buttons. This is the Stable Diffusion web UI wiki. Untuk pengetesan ini, kami menggunakan kartu grafis RTX 4060 Ti 16 GB, RTX 3080 10 GB, dan RTX 3060 12 GB. SDXL-0. 5) I dont think you need such a expensive Mac, a Studio M2 Max or a Studio M1 Max should have the same performance in generating Times. Human anatomy, which even Midjourney struggled with for a long time, is also handled much better by SDXL, although the finger problem seems to have. We covered it a bit earlier, but the pricing of this current Ada Lovelace generation requires some digging into. 0 alpha. 9 but I'm figuring that we will have comparable performance in 1. Today, we are excited to release optimizations to Core ML for Stable Diffusion in macOS 13. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance. They may just give the 20* bar as a performance metric, instead of the requirement of tensor cores. Insanely low performance on a RTX 4080. finally , AUTOMATIC1111 has fixed high VRAM issue in Pre-release version 1. (PS - I noticed that the units of performance echoed change between s/it and it/s depending on the speed. This means that you can apply for any of the two links - and if you are granted - you can access both. git 2023-08-31 hash:5ef669de. Then select Stable Diffusion XL from the Pipeline dropdown. 0 model was developed using a highly optimized training approach that benefits from a 3. The RTX 4090 is based on Nvidia’s Ada Lovelace architecture. Stability AI, the company behind Stable Diffusion, said, "SDXL 1. next, comfyUI and automatic1111. Stable Diffusion Benchmarked: Which GPU Runs AI Fastest (Updated) vram is king,. 9. Below we highlight two key factors: JAX just-in-time (jit) compilation and XLA compiler-driven parallelism with JAX pmap. The path of the directory should replace /path_to_sdxl. First, let’s start with a simple art composition using default parameters to give our GPUs a good workout. Evaluation. But yeah, it's not great compared to nVidia. heat 1 tablespoon of olive oil in a skillet over medium heat ', ' add bell pepper and saut until softened slightly , about 3 minutes ', ' add onion and season with salt and pepper ', ' saut until softened , about 7 minutes ', ' stir in the chicken ', ' add heavy cream , buffalo sauce and blue cheese ', ' stir and cook until heated through , about 3-5 minutes ',. M. 121. Stable Diffusion XL (SDXL) Benchmark – 769 Images Per Dollar on Salad. Or drop $4k on a 4090 build now. 5 it/s. 5: Options: Inputs are the prompt, positive, and negative terms. There are slight discrepancies between the output of SDXL-VAE-FP16-Fix and SDXL-VAE, but the decoded images should be close. Training T2I-Adapter-SDXL involved using 3 million high-resolution image-text pairs from LAION-Aesthetics V2, with training settings specifying 20000-35000 steps, a batch size of 128 (data parallel with a single GPU batch size of 16), a constant learning rate of 1e-5, and mixed precision (fp16). 3gb of vram at 1024x1024 while sd xl doesn't even go above 5gb. Thankfully, u/rkiga recommended that I downgrade my Nvidia graphics drivers to version 531. Or drop $4k on a 4090 build now. 0 outshines its predecessors and is a frontrunner among the current state-of-the-art image generators. *do-not-batch-cond-uncond LoRA is a type of performance-efficient fine-tuning, or PEFT, that is much cheaper to accomplish than full model fine-tuning. 6 or later (13. Thanks to specific commandline arguments, I can handle larger resolutions, like 1024x1024, and use still ControlNet smoothly and also use. SD XL. This mode supports all SDXL based models including SDXL 0. Install Python and Git. I find the results interesting for. ☁️ FIVE Benefits of a Distributed Cloud powered by gaming PCs: 1. Stability AI is positioning it as a solid base model on which the. Looking to upgrade to a new card that'll significantly improve performance but not break the bank. The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0. Base workflow: Options: Inputs are only the prompt and negative words. Use the optimized version, or edit the code a little to use model. The optimized versions give substantial improvements in speed and efficiency. ago. Along with our usual professional tests, we've added Stable Diffusion benchmarks on the various GPUs. I'm using a 2016 built pc with a 1070 with 16GB of VRAM. 3. This checkpoint recommends a VAE, download and place it in the VAE folder. NansException: A tensor with all NaNs was produced in Unet. Here's the range of performance differences observed across popular games: in Shadow of the Tomb Raider, with 4K resolution and the High Preset, the RTX 4090 is 356% faster than the GTX 1080 Ti. Size went down from 4. We are proud to host the TensorRT versions of SDXL and make the open ONNX weights available to users of SDXL globally. The most you can do is to limit the diffusion to strict img2img outputs and post-process to enforce as much coherency as possible, which works like a filter on a pre-existing video. I the past I was training 1. Stable Diffusion XL (SDXL) Benchmark – 769 Images Per Dollar on Salad. SDXL Benchmark with 1,2,4 batch sizes (it/s): SD1. At 769 SDXL images per dollar, consumer GPUs on Salad’s distributed. Building upon the foundation of Stable Diffusion, SDXL represents a quantum leap in performance, achieving results that rival state-of-the-art image generators while promoting openness. In a groundbreaking advancement, we have unveiled our latest. The chart above evaluates user preference for SDXL (with and without refinement) over Stable Diffusion 1. 5x slower. SDXL basically uses 2 separate checkpoints to do the same what 1. Your Path to Healthy Cloud Computing ~ 90 % lower cloud cost. macOS 12. 5 base model: 7. For direct comparison, every element should be in the right place, which makes it easier to compare. Read More. 5 had just one. For users with GPUs that have less than 3GB vram, ComfyUI offers a. metal0130 • 7 mo. Aug 30, 2023 • 3 min read. The SDXL model will be made available through the new DreamStudio, details about the new model are not yet announced but they are sharing a couple of the generations to showcase what it can do. Found this Google Spreadsheet (not mine) with more data and a survey to fill. LORA's is going to be very popular and will be what most applicable to most people for most use cases. 3. 5 billion-parameter base model. The generation time increases by about a factor of 10. We're excited to announce the release of Stable Diffusion XL v0. The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance. They could have provided us with more information on the model, but anyone who wants to may try it out. Even with AUTOMATIC1111, the 4090 thread is still open. . ) and using standardized txt2img settings. 99% on the Natural Questions dataset. r/StableDiffusion • "1990s vintage colored photo,analog photo,film grain,vibrant colors,canon ae-1,masterpiece, best quality,realistic, photorealistic, (fantasy giant cat sculpture made of yarn:1. bat' file, make a shortcut and drag it to your desktop (if you want to start it without opening folders) 10. SDXL GPU Benchmarks for GeForce Graphics Cards. 8 cudnn: 8800 driver: 537. 5 is slower than SDXL at 1024 pixel an in general is better to use SDXL. -. VRAM Size(GB) Speed(sec. 5 and 2. 🧨 Diffusers SDXL GPU Benchmarks for GeForce Graphics Cards. SDXL-VAE-FP16-Fix was created by finetuning the SDXL-VAE to: 1. lozanogarcia • 2 mo. Stable Diffusion 1. Stable Diffusion XL (SDXL) is a powerful text-to-image generation model that iterates on the previous Stable Diffusion models in three key ways: the UNet is 3x larger and SDXL combines a second text encoder (OpenCLIP ViT-bigG/14) with the original text encoder to significantly increase the number of parameters. Core clockspeed will barely give any difference in performance. After that, the bot should generate two images for your prompt. SDXL is now available via ClipDrop, GitHub or the Stability AI Platform. Seems like a good starting point. SDXL Installation. Generating with sdxl is significantly slower and will continue to be significantly slower for the forseeable future. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. 35, 6. SDXL consists of a two-step pipeline for latent diffusion: First, we use a base model to generate latents of the desired output size. 9 model, and SDXL-refiner-0. It's a small amount slower than ComfyUI, especially since it doesn't switch to the refiner model anywhere near as quick, but it's been working just fine. Originally Posted to Hugging Face and shared here with permission from Stability AI. Senkkopfschraube •. Performance Against State-of-the-Art Black-Box. SDXL-VAE-FP16-Fix was created by finetuning the SDXL-VAE to: 1. Get up and running with the most cost effective SDXL infra in a matter of minutes, read the full benchmark here 11 3 Comments Like CommentThe SDXL 1. GPU : AMD 7900xtx , CPU: 7950x3d (with iGPU disabled in BIOS), OS: Windows 11, SDXL: 1. Hires. Your card should obviously do better. 0, an open model representing the next evolutionary step in text-to-image generation models. ) Cloud - Kaggle - Free. 5 will likely to continue to be the standard, with this new SDXL being an equal or slightly lesser alternative. Stable Diffusion XL (SDXL) GPU Benchmark Results . To use SDXL with SD. for 8x the pixel area. Unless there is a breakthrough technology for SD1. backends. Stable Diffusion XL has brought significant advancements to text-to-image and generative AI images in general, outperforming or matching Midjourney in many aspects. The RTX 4090 costs 33% more than the RTX 4080, but its overall specs far exceed that 33%. 5 from huggingface and their opposition to its release: But there is a reason we've taken a step. 0 involves an impressive 3. *do-not-batch-cond-uncondLoRA is a type of performance-efficient fine-tuning, or PEFT, that is much cheaper to accomplish than full model fine-tuning. • 3 mo. 9 is able to be run on a fairly standard PC, needing only a Windows 10 or 11, or Linux operating system, with 16GB RAM, an Nvidia GeForce RTX 20 graphics card (equivalent or higher standard) equipped with a minimum of 8GB of VRAM. For AI/ML inference at scale, the consumer-grade GPUs on community clouds outperformed the high-end GPUs on major cloud providers. This GPU handles SDXL very well, generating 1024×1024 images in just. But these improvements do come at a cost; SDXL 1. Image created by Decrypt using AI. On a 3070TI with 8GB. DubaiSim. 5 fared really bad here – most dogs had multiple heads, 6 legs, or were cropped poorly like the example chosen. 由于目前SDXL还不够成熟,模型数量和插件支持相对也较少,且对硬件配置的要求进一步提升,所以. cudnn. It can produce outputs very similar to the source content (Arcane) when you prompt Arcane Style, but flawlessly outputs normal images when you leave off that prompt text, no model burning at all. 47 seconds. The 4080 is about 70% as fast as the 4090 at 4k at 75% the price. compare that to fine-tuning SD 2. 0, Stability AI once again reaffirms its commitment to pushing the boundaries of AI-powered image generation, establishing a new benchmark for competitors while continuing to innovate and refine its. One way to make major improvements would be to push tokenization (and prompt use) of specific hand poses, as they have more fixed morphology - i. SDXL - The Best Open Source Image Model The Stability AI team takes great pride in introducing SDXL 1. For our tests, we’ll use an RTX 4060 Ti 16 GB, an RTX 3080 10 GB, and an RTX 3060 12 GB graphics card. 8M runs GitHub Paper License Demo API Examples README Train Versions (39ed52f2) Examples. 1024 x 1024. 0 が正式リリースされました この記事では、SDXL とは何か、何ができるのか、使ったほうがいいのか、そもそも使えるのかとかそういうアレを説明したりしなかったりします 正式リリース前の SDXL 0. In your copy of stable diffusion, find the file called "txt2img. (close-up editorial photo of 20 yo woman, ginger hair, slim American. 9. Benchmark GPU SDXL untuk Kartu Grafis GeForce. 6. Next supports two main backends: Original and Diffusers which can be switched on-the-fly: Original: Based on LDM reference implementation and significantly expanded on by A1111. There are a lot of awesome new features coming out, and I’d love to hear your feedback!. It can be set to -1 in order to run the benchmark indefinitely. For example, in #21 SDXL is the only one showing the fireflies. SDXL GPU Benchmarks for GeForce Graphics Cards. Stability AI claims that the new model is “a leap. Notes: ; The train_text_to_image_sdxl. ” Stable Diffusion SDXL 1. This is a benchmark parser I wrote a few months ago to parse through the benchmarks and produce a whiskers and bar plot for the different GPUs filtered by the different settings, (I was trying to find out which settings, packages were most impactful for the GPU performance, that was when I found that running at half precision, with xformers. However, this will add some overhead to the first run (i. Right: Visualization of the two-stage pipeline: We generate initial.