Conferences >2020 IEEE/CVF Conference on C...

Sketchformer: Transformer-Based Representation for Sketched Structure

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively a...Show More

Metadata

Abstract:

Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image retrieval (SBIR), and the reconstruction and interpolation of sketches. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks, when compared against baseline representations driven by LSTM sequence to sequence architectures: SketchRNN and derivatives. We show that sketch reconstruction and interpolation are improved significantly by the Sketchformer embedding for complex sketches with longer stroke sequences.

Published in: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Date of Conference: 13-19 June 2020

Date Added to IEEE Xplore: 05 August 2020

ISBN Information:

ISSN Information:

DOI: 10.1109/CVPR42600.2020.01416

Conference Location: Seattle, WA, USA

Contents

1. Introduction

Sketch representation and interpretation remains an open challenge, particularly for complex and casually constructed drawings. Yet, the ability to classify, search, and manipulate sketched content remains attractive as gesture and touch interfaces reach ubiquity. Advances in recurrent network architectures within language processing have recently inspired sequence modeling approaches to sketch (e.g. SketchRNN [1]) that encode sketch as a variable length sequence of strokes, rather than in a rasterized or ‘pixel’ form. In particular, long-short term memory (LSTM) networks have shown significant promise in learning search embeddings [2], [3] due to their ability to model higher-level structure and temporal order versus convolutional neural networks (CNNs) on rasterized sketches [4, 5, 6, 7]. Yet, the limited temporal extent of LSTM restricts the structural complexity of sketches that may be accommodated in sequence embeddings. In language modeling domain, this shortcoming has been addressed through the emergence of Transformer networks [8, 9, 10] in which slot masking enhances the ability to learn complex structures that are represented by longer sequences.

References is not available for this document.

MIT Libraries

MIT Libraries

Sketchformer: Transformer-Based Representation for Sketched Structure

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Sketchformer: Transformer-Based Representation for Sketched Structure

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. Introduction

References