Even_Adder

joined 1 year ago

Is this an attempt to beat those monopoly allegations?

[–] Even_Adder@lemmy.dbzer0.com 30 points 6 days ago

They tried to make video game rentals illegal in the US. They've always been a shitty, anti-consumer company.

[–] Even_Adder@lemmy.dbzer0.com 1 points 1 week ago (1 children)

Let me know if this kind of post isn't allowed here. It looked alright to post per the rules.

 

Abstract

We propose the first video diffusion framework for reference-based lineart video colorization. Unlike previous works that rely solely on image generative models to colorize lineart frame by frame, our approach leverages a large-scale pretrained video diffusion model to generate colorized animation videos. This approach leads to more temporally consistent results and is better equipped to handle large motions. Firstly, we introduce Sketch-guided ControlNet which provides additional control to finetune an image-to-video diffusion model for controllable video synthesis, enabling the generation of animation videos conditioned on lineart. We then propose Reference Attention to facilitate the transfer of colors from the reference frame to other frames containing fast and expansive motions. Finally, we present a novel scheme for sequential sampling, incorporating the Overlapped Blending Module and Prev-Reference Attention, to extend the video diffusion model beyond its original fixed-length limitation for long video colorization. Both qualitative and quantitative results demonstrate that our method significantly outperforms state-of-the-art techniques in terms of frame and video quality, as well as temporal consistency. Moreover, our method is capable of generating high-quality, long temporal-consistent animation videos with large motions, which is not achievable in previous works. Our code and model are available at this https URL.

Paper: https://arxiv.org/abs/2409.12960

Project Page: https://luckyhzt.github.io/lvcd

Code: (coming soon)

Supplementary Demo clips: https://luckyhzt.github.io/lvcd/supplementary/supplementary.html

[–] Even_Adder@lemmy.dbzer0.com 44 points 1 week ago (2 children)

Nintendo has always been an underhanded bully. This isn't new.

[–] Even_Adder@lemmy.dbzer0.com 1 points 1 week ago

Damn, that's all we get.

[–] Even_Adder@lemmy.dbzer0.com 0 points 1 week ago* (last edited 1 week ago)

I wanna believe you, but the JPEG artifacts on an image that small make it extremely difficult to even notice the distortions you're referring to, especially at a glance. You've made it obvious you're replying in bad faith, so I'm gonna leave it here. Have a good one.

[–] Even_Adder@lemmy.dbzer0.com 0 points 1 week ago (2 children)

That's a lot of things to infer off of just scrolling past a 512×768 JPEG. If the image was in another context and the text had been different, no one would have batted an eye.

[–] Even_Adder@lemmy.dbzer0.com 0 points 1 week ago (4 children)

That isn't extremely obvious though, especially with the JPEG compression. If you didn't know to look, you wouldn't have noticed it. No one scrutinizes Jeopardy text.

[–] Even_Adder@lemmy.dbzer0.com 1 points 1 week ago* (last edited 1 week ago) (6 children)

Yeah, but the point isn't to look like a legit Jeopardy clue, it just has to not look generated. You can respect the height limit if you want, or break it.

Your reply also wasn't in the form of a question. No points.

[–] Even_Adder@lemmy.dbzer0.com -2 points 1 week ago

I think something innocuous or inoffensive enough to most people qualify as "good looking". I mean, that's how marketing works.

[–] Even_Adder@lemmy.dbzer0.com -2 points 1 week ago (8 children)

A generated image could be so good you'd never be able to tell. Like this one:

[–] Even_Adder@lemmy.dbzer0.com 1 points 1 week ago (12 children)

I think the real complaint here is about bad looking art. Not a lot of people have an eye for picking out good-looking images. Or this person is just a huge snob.

20
submitted 6 months ago* (last edited 6 months ago) by Even_Adder@lemmy.dbzer0.com to c/streetmoe@ani.social
 

(b74#0 - nya~) on Discrod (2024)

Image description: A young woman with vibrant yellow hair sitting on a set of stairs at night. She is dressed in a black hoodie. Her gaze is directed towards the camera, and she wears a warm smile on her face. The stairs are illuminated by the soft glow of streetlights, casting a gentle light on the scene. The background is dark, suggesting a quiet, urban setting. The overall mood of the image is serene and inviting, capturing a moment of tranquility in the midst of the city's hustle and bustle.

Full Generation Parameters:

(ijichi-nijika photo-background) dutch-angle scenery real-world-location backlighting lens-flare evening dimly-lit night asymmetrical asymmetry Metallic-luster full-body head-rest hand-on-own-chin black-jacket hoodie streetwear 1girl side-ponytail solo blonde-hair bangs smile looking-at-viewer long-hair long-sleeves sidelocks looking-at-viewer brown-eyes street sitting-on-stair from-below builds lamppost city best-quality newest high-quality late detailed-background [:arm_support shooting-star black-legwear:10] <lora:ROCK3(strong):1:lbw=XLALL:stop=20> [:(half-closed-eyes:1.2):15]

Negative prompt: frown symmetrical symmetry low-quality worst-quality old oldest normal-quality bad-anatomy bad-hands nsfw [:(closed-eyes:1.2):15] zettai_ryouiki

Steps: 25, Sampler: Euler a, CFG scale: 7, Seed: 2600022596, Size: 896x1344, Model hash: 80da973b09, Model: umbra_mecha, VAE hash: 6a3d57b525, VAE: sdxl_vae.safetensors, Denoising strength: 0.4, ENSD: 31337,

TagSep Prompt: "(ijichi nijika,photo background),dutch angle, scenery,real world location,backlighting,lens flare,evening, dimly lit, night, asymmetrical, asymmetry,Metallic luster,full body,head rest,hand on own chin ,black jacket, hoodie,streetwear,1girl, side ponytail, solo, blonde hair, bangs, smile, looking at viewer, long hair, long sleeves, sidelocks, looking at viewer, brown eyes,street,sitting on stair, from below,builds,lamppost, city,best quality, newest, high quality, late,detailed background,[:arm_support,shooting star,black legwear:10] <lora:ROCK3(strong):1:lbw=XLALL:stop=20>,[:(half-closed eyes:1.2):15]", TagSep Negative: "frown,symmetrical, symmetry,low quality, worst quality, old, oldest, normal quality, bad anatomy, bad hands, nsfw,[:(closed eyes:1.2):15],zettai_ryouiki", Hires upscale: 1.5, Hires upscaler: RealESRGAN_x4plus_anime_6B, Lora hashes: "ROCK3(strong): e7c0ccd059f6", Pad conds: True, Version: v1.8.0

 

(turkey910) (2024)

Image description: A girl is seated on a bench against the backdrop of a cityscape at night. She is wearing a white hoodie adorned with a dark blue stripe and withe floral patterns and black leggings. Illuminated by the gentle glow of nearby lights, buildings stand tall but silent, their windows aglow with warmth.

Full Generation Parameters:

(masterpiece), high quality, (detailed background:1.3), 1girl, solo, <lora:NoburaKagisaki-v3-07:0.5>, ChopioNoburaKugisaki, short hair, brown hair, asymmetrical bangs, brown eyes, looking up, medium breasts, hair behind ear, outfit_2, multicolored clothes, hoodie, long sleeves, floral print, black undershirt, black leggings, pink sneakers, akihabara \(tokyo\), city, buildings, (night, dark:1.2), park, grass, stone path, bench, sitting,

Negative prompt: (FastNegativeEmbedding:1.2), (Bad_Hands:1.5),

Steps: 30, ENSD: 31337, Size: 512x768, Seed: 2117614637, Version: v1.5.1, Sampler: Euler a, CFG scale: 7, Clip skip: 2, Bad_Hands: aa7651be154c", Hires steps: 14, Hires upscale: 1.4, Hires upscaler: 4x-AnimeSharp, Denoising strength: 0.5, "NoburaKagisaki-v3-07: 8127ea161f19", "FastNegativeEmbedding: 687b669d8234

 

(ToriBirdZ) (2023)

Image description: A woman wearing an eye mask looks down at the camera with mild surprise, She is wearing a red jacket with white fur lining, a white graphic top, a pendant, and black ripped jeans.

Full Generation Parameters:

(masterpiece, best quality:1.2), 1girl, <lora:frima-nikke-richy-v1:1>, frimadef, grey hair, white shirt, red jacket, fur trim, open clothes, off shoulder, black pants, torn pants, necklace, sleep mask, garden, from below

Negative prompt: (worst quality, low quality:1.4), signature, artist name, patreon usename, web address, watermark, twitter username, dated

Steps: 20, Size: 512x768, Seed: 1476359437, Model: toriultraanimemix_v10, Version: v1.7.0, Sampler: Euler a, CFG scale: 7, Clip skip: 2, Model hash: f53da70b24, Hires steps: 10, Hires upscale: 2, Hires upscaler: R-ESRGAN 4x+ Anime6B, Denoising strength: 0.5, "frima-nikke-richy-v1: 31e94cb8ed3c"

2
City Steps Casual (image.civitai.com)
submitted 11 months ago* (last edited 11 months ago) by Even_Adder@lemmy.dbzer0.com to c/streetmoe@ani.social
 

(Manityro) (2023)

Image Caption: An illustration of Izumi Noa from Patlabor sitting on a set of concrete steps. She is wearing a white hoodie with a small yellow logo on the arm and black shoes. The background shows a city skyline with tall buildings behind a green hill.

Full Generation Parameters:

(masterpiece, best quality), (anime, 1990s \(style\)), outdoors, futuristic city, park, cowboy shot, 1girl, solo, IzumiNoa, , sitting on stairs, looking at viewer, denim jeans, white hoodie, black footwear, earbuds,

Negative prompt: (worst quality, low quality:1.4), lowres, loli, child, bokeh, text, signature, sketch, watermark, artist name, speech bubble, blurry, pubic hair, pubes, (mole, mole under eye, mole on breast),

Steps: 25, Size: 512x768, Seed: 592778849, Model: based65_v20, Version: v1.5.1, Sampler: DPM++ 2M SDE Karras, CFG scale: 7.5, Clip skip: 2, Model hash: e5f3fe53a9, Hires steps: 15, Hires upscale: 2.5, Hires upscaler: 4x-AnimeSharp, ADetailer model: face_yolov8n.pt, ADetailer version: 23.7.11, Denoising strength: 0.35, ADetailer mask blur: 4, ADetailer confidence: 0.3, ADetailer dilate/erode: 4, ADetailer inpaint padding: 32, "IzumiNoa_V1-Manityro-Dadapt: 5543e4e5b0b0", ADetailer denoising strength: 0.4, ADetailer inpaint only masked: True

 

(.v.i.l.) on Discord (2023)

Image Caption: The face of a vampire is illuminated by the moonlight. She has white hair and pointed ears. She wears a dark cloak and a white blouse with a black corset with red laces. She is surrounded by red flowers.

 

Abstract:

Significant advancements have been achieved in the realm of large-scale pre-trained text-to-video Diffusion Models (VDMs). However, previous methods either rely solely on pixel-based VDMs, which come with high computational costs, or on latent-based VDMs, which often struggle with precise text-video alignment. In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation. Our model first uses pixel-based VDMs to produce a low-resolution video of strong text-video correlation. After that, we propose a novel expert translation method that employs the latent-based VDMs to further upsample the low-resolution video to high resolution. Compared to latent VDMs, Show-1 can produce high-quality videos of precise text-video alignment; Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage during inference is 15G vs 72G). We also validate our model on standard video generation benchmarks. Our code and model weights are publicly available at https://github.com/showlab/Show-1.

17
submitted 11 months ago* (last edited 11 months ago) by Even_Adder@lemmy.dbzer0.com to c/foss@beehaw.org
 

Abstract:

Significant advancements have been achieved in the realm of large-scale pre-trained text-to-video Diffusion Models (VDMs). However, previous methods either rely solely on pixel-based VDMs, which come with high computational costs, or on latent-based VDMs, which often struggle with precise text-video alignment. In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation. Our model first uses pixel-based VDMs to produce a low-resolution video of strong text-video correlation. After that, we propose a novel expert translation method that employs the latent-based VDMs to further upsample the low-resolution video to high resolution. Compared to latent VDMs, Show-1 can produce high-quality videos of precise text-video alignment; Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage during inference is 15G vs 72G). We also validate our model on standard video generation benchmarks. Our code and model weights are publicly available at https://github.com/showlab/Show-1.

view more: next ›