Transformers (`depthcharge.transformers`)

`SpectrumTransformerEncoder(d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0.0, peak_encoder=True)`

Bases: Module, ModelMixin, TransformerMixin

A Transformer encoder for input mass spectra.

Use this PyTorch module to embed mass spectra. By default, nothing other than the m/z and intensity arrays for each mass spectrum are considered. However, arbitrary information can be integrated into the spectrum representation by subclassing this class and overwriting the global_token_hook() method.

PARAMETER	DESCRIPTION
`d_model`	The latent dimensionality to represent peaks in the mass spectrum. TYPE: `int` DEFAULT: `128`
`nhead`	The number of attention heads in each layer. `d_model` must be divisible by `nhead`. TYPE: `int` DEFAULT: `8`
`dim_feedforward`	The dimensionality of the fully connected layers in the Transformer layers of the model. TYPE: `int` DEFAULT: `1024`
`n_layers`	The number of Transformer layers. TYPE: `int` DEFAULT: `1`
`dropout`	The dropout probability for all layers. TYPE: `float` DEFAULT: `0.0`
`peak_encoder`	The function to encode the (m/z, intensity) tuples of each mass spectrum. `True` uses the default sinusoidal encoding and `False` instead performs a 1 to `d_model` learned linear projection. TYPE: `PeakEncoder or bool` DEFAULT: `True`

ATTRIBUTE	DESCRIPTION
`d_model`	TYPE: `int`
`nhead`	TYPE: `int`
`dim_feedforward`	TYPE: `int`
`n_layers`	TYPE: `int`
`dropout`	TYPE: `float`
`peak_encoder`	The function to encode the (m/z, intensity) tuples of each mass spectrum. TYPE: `Module or Callable`
`transformer_encoder`	The Transformer encoder layers. TYPE: `TransformerEncoder`

Attributes

`d_model: int` `property`

The latent dimensionality of the model.

`device: torch.device` `property`

The current device for first parameter of the model.

`dim_feedforward: int` `property`

The dimensionality of the Transformer feedforward layers.

`dropout: float` `property`

The dropout for the transformer layers.

`n_layers: int` `property`

The number of Transformer layers.

`nhead: int` `property`

The number of attention heads.

Functions

`forward(mz_array, intensity_array, *args, mask=None, **kwargs)`

Embed a batch of mass spectra.

PARAMETER	DESCRIPTION
`mz_array`	The zero-padded m/z dimension for a batch of mass spectra. TYPE: `torch.Tensor of shape (n_spectra, n_peaks)`
`intensity_array`	The zero-padded intensity dimension for a batch of mass spctra. TYPE: `torch.Tensor of shape (n_spectra, n_peaks)`
`*args`	Additional data. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `Tensor` DEFAULT: `()`
`mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The mask for the sequence. TYPE: `Tensor` DEFAULT: `None`
`**kwargs`	Additional data fields. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`latent`	The latent representations for the spectrum and each of its peaks. TYPE: `torch.Tensor of shape (n_spectra, n_peaks + 1, d_model)`
`mem_mask`	The memory mask specifying which elements were padding in X. TYPE: `Tensor`

`global_token_hook(mz_array, intensity_array, *args, **kwargs)`

Define how additional information in the batch may be used.

Overwrite this method to define custom functionality dependent on information in the batch. Examples would be to incorporate any combination of the mass, charge, retention time, or ion mobility of a precursor ion.

The representation returned by this method is preprended to the peak representations that are fed into the Transformer encoder and ultimately contribute to the spectrum representation that is the first element of the sequence in the model output.

By default, this method returns a tensor of zeros.

PARAMETER	DESCRIPTION
`mz_array`	The zero-padded m/z dimension for a batch of mass spectra. TYPE: `torch.Tensor of shape (n_spectra, n_peaks)`
`intensity_array`	The zero-padded intensity dimension for a batch of mass spctra. TYPE: `torch.Tensor of shape (n_spectra, n_peaks)`
`*args`	Additional data passed with the batch. TYPE: `Tensor` DEFAULT: `()`
`**kwargs`	Additional data passed with the batch. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`torch.Tensor of shape (batch_size, d_model)`	The precursor representations.

`AnalyteTransformerEncoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, padding_int=None)`

Bases: _AnalyteTransformer

A transformer encoder for peptide and small molecule analytes.

PARAMETER	DESCRIPTION
`n_tokens`	The number of tokens used to tokenize molecular sequences. TYPE: `int or Tokenizer`
`d_model`	The latent dimensionality to represent each element in the molecular sequence. TYPE: `int` DEFAULT: `128`
`nhead`	The number of attention heads in each layer. `d_model` must be divisible by `nhead`. TYPE: `int` DEFAULT: `8`
`dim_feedforward`	The dimensionality of the fully connected layers in the Transformer layers of the model. TYPE: `int` DEFAULT: `1024`
`n_layers`	The number of Transformer layers. TYPE: `int` DEFAULT: `1`
`dropout`	The dropout probability for all layers. TYPE: `float` DEFAULT: `0`
`positional_encoder`	The positional encodings to use for the elements of the sequence. If `True`, the default positional encoder is used. `False` disables positional encodings, typically only for ablation tests. TYPE: `PositionalEncoder or bool` DEFAULT: `True`
`padding_int`	The index that represents padding in the input sequence. Required only if `n_tokens` was provided as an `int`. TYPE: `int` DEFAULT: `None`

Attributes

`d_model: int` `property`

The latent dimensionality of the model.

`device: torch.device` `property`

The current device for first parameter of the model.

`dim_feedforward: int` `property`

The dimensionality of the Transformer feedforward layers.

`dropout: float` `property`

The dropout for the transformer layers.

`n_layers: int` `property`

The number of Transformer layers.

`nhead: int` `property`

The number of attention heads.

Functions

`forward(tokens, *args, mask=None, **kwargs)`

Encode a collection of sequences.

PARAMETER	DESCRIPTION
`tokens`	The integer tokens describing each analyte sequence, padded to the maximum analyte length in the batch with 0s. TYPE: `torch.Tensor of size (batch_size, len_sequence)`
`*args`	Additional data. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `Tensor` DEFAULT: `()`
`mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The mask for the sequence. TYPE: `Tensor` DEFAULT: `None`
`**kwargs`	Additional data fields. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`latent`	The latent representations for the spectrum and each of its peaks. TYPE: `torch.Tensor of shape (batch_size, len_sequence, d_model)`
`mem_mask`	The memory mask specifying which elements were padding in X. TYPE: `Tensor`

`global_token_hook(tokens, *args, **kwargs)`

Define how additional information in the batch may be used.

Overwrite this method to define custom functionality dependent on information in the batch. Examples would be to incorporate any combination of the mass, charge, retention time, or ion mobility of an analyte.

The representation returned by this method is preprended to the peak representations that are fed into the Transformer and ultimately contribute to the analyte representation that is the first element of the sequence in the model output.

By default, this method returns a tensor of zeros.

PARAMETER	DESCRIPTION
`tokens`	The partial molecular sequences for which to predict the next token. Optionally, these may be the token indices instead of a string. TYPE: `list of str, torch.Tensor, or None`
`*args`	Additional data passed with the batch. TYPE: `Tensor` DEFAULT: `()`
`**kwargs`	Additional data passed with the batch. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`torch.Tensor of shape (batch_size, d_model)`	The global token representations.

`AnalyteTransformerDecoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, padding_int=None)`

Bases: _AnalyteTransformer

A transformer decoder for peptide or small molecule sequences.

PARAMETER	DESCRIPTION
`n_tokens`	The number of tokens used to tokenize molecular sequences. TYPE: `int or Tokenizer`
`d_model`	The latent dimensionality to represent elements of the sequence. TYPE: `int` DEFAULT: `128`
`nhead`	The number of attention heads in each layer. `d_model` must be divisible by `nhead`. TYPE: `int` DEFAULT: `8`
`dim_feedforward`	The dimensionality of the fully connected layers in the Transformer layers of the model. TYPE: `int` DEFAULT: `1024`
`n_layers`	The number of Transformer layers. TYPE: `int` DEFAULT: `1`
`dropout`	The dropout probability for all layers. TYPE: `float` DEFAULT: `0`
`positional_encoder`	The positional encodings to use for the molecular sequence. If `True`, the default positional encoder is used. `False` disables positional encodings, typically only for ablation tests. TYPE: `PositionalEncoder or bool` DEFAULT: `True`
`padding_int`	The index that represents padding in the input sequence. Required only if `n_tokens` was provided as an `int`. TYPE: `int` DEFAULT: `None`

Attributes

`d_model: int` `property`

The latent dimensionality of the model.

`device: torch.device` `property`

The current device for first parameter of the model.

`dim_feedforward: int` `property`

The dimensionality of the Transformer feedforward layers.

`dropout: float` `property`

The dropout for the transformer layers.

`n_layers: int` `property`

The number of Transformer layers.

`nhead: int` `property`

The number of attention heads.

Functions

`embed(tokens, *args, memory, memory_key_padding_mask=None, memory_mask=None, tgt_mask=None, **kwargs)`

Embed a collection of sequences.

PARAMETER	DESCRIPTION
`tokens`	The partial molecular sequences for which to predict the next token. Optionally, these may be the token indices instead of a string. TYPE: `list of str, torch.Tensor, or None`
`*args`	Additional data. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `Tensor` DEFAULT: `()`
`memory`	The representations from a `TransformerEncoder`, such as a `SpectrumTransformerEncoder`. TYPE: `torch.Tensor of shape (batch_size, len_seq, d_model)`
`memory_key_padding_mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The mask that indicates which elements of `memory` are padding. TYPE: `torch.Tensor of shape (batch_size, len_seq)` DEFAULT: `None`
`memory_mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The mask for the memory sequence. TYPE: `Tensor` DEFAULT: `None`
`tgt_mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The default is a mask that is suitable for predicting the next element in the sequence. TYPE: `Tensor or None` DEFAULT: `None`
`**kwargs`	Additional data fields. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`embeddings`	The output of the Transformer layer containing the embeddings of the tokens in the sequence. These may be tranformed to yield scores for token predictions using the `.score_embeddings()` method. TYPE: `torch.Tensor of size (batch_size, len_sequence, d_model)`

`forward(tokens, *args, memory, memory_key_padding_mask=None, memory_mask=None, tgt_mask=None, **kwargs)`

Decode a collection of sequences.

PARAMETER	DESCRIPTION
`tokens`	The partial molecular sequences for which to predict the next token. Optionally, these may be the token indices instead of a string. TYPE: `list of str, torch.Tensor, or None`
`*args`	Additional data. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `Tensor` DEFAULT: `()`
`memory`	The representations from a `TransformerEncoder`, such as a `SpectrumTransformerEncoder`. TYPE: `torch.Tensor of shape (batch_size, len_seq, d_model)`
`memory_key_padding_mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The mask that indicates which elements of `memory` are padding. TYPE: `torch.Tensor of shape (batch_size, len_seq)` DEFAULT: `None`
`memory_mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The mask for the memory sequence. TYPE: `Tensor` DEFAULT: `None`
`tgt_mask`	Passed to `torch.nn.TransformerEncoder.forward()`. The default is a mask that is suitable for predicting the next element in the sequence. TYPE: `Tensor or None` DEFAULT: `None`
`**kwargs`	Additional data fields. These may be used by overwriting the `global_token_hook()` method in a subclass. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`scores`	The raw output for the final linear layer. These can be Softmax transformed to yield the probability of each token for the prediction. TYPE: `torch.Tensor of size (batch_size, len_sequence, n_tokens)`

`global_token_hook(tokens, *args, **kwargs)`

Define how additional information in the batch may be used.

Overwrite this method to define custom functionality dependent on information in the batch. Examples would be to incorporate any combination of the mass, charge, retention time, or ion mobility of an analyte.

The representation returned by this method is preprended to the peak representations that are fed into the Transformer and ultimately contribute to the analyte representation that is the first element of the sequence in the model output.

By default, this method returns a tensor of zeros.

PARAMETER	DESCRIPTION
`tokens`	The partial molecular sequences for which to predict the next token. Optionally, these may be the token indices instead of a string. TYPE: `list of str, torch.Tensor, or None`
`*args`	Additional data passed with the batch. TYPE: `Tensor` DEFAULT: `()`
`**kwargs`	Additional data passed with the batch. TYPE: `dict` DEFAULT: `{}`

RETURNS	DESCRIPTION
`torch.Tensor of shape (batch_size, d_model)`	The global token representations.

`score_embeddings(embeddings)`

Score the embeddings to find the most confident tokens.

PARAMETER	DESCRIPTION
`embeddings`	The embeddings from the Transformer layer. TYPE: `Tensor`

RETURNS	DESCRIPTION
`scores`	The raw output for the final linear layer. These can be Softmax transformed to yield the probability of each token for the prediction. TYPE: `torch.Tensor of size (batch_size, len_sequence, n_tokens)`

Transformers (depthcharge.transformers)

SpectrumTransformerEncoder(d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0.0, peak_encoder=True)

Attributes

d_model: int property

device: torch.device property

dim_feedforward: int property

dropout: float property

n_layers: int property

nhead: int property

Functions

forward(mz_array, intensity_array, *args, mask=None, **kwargs)

global_token_hook(mz_array, intensity_array, *args, **kwargs)

AnalyteTransformerEncoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, padding_int=None)

Attributes

d_model: int property

device: torch.device property

dim_feedforward: int property

dropout: float property

n_layers: int property

nhead: int property

Functions

forward(tokens, *args, mask=None, **kwargs)

global_token_hook(tokens, *args, **kwargs)

AnalyteTransformerDecoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, padding_int=None)

Attributes

d_model: int property

device: torch.device property

dim_feedforward: int property

dropout: float property

n_layers: int property

nhead: int property

Functions

embed(tokens, *args, memory, memory_key_padding_mask=None, memory_mask=None, tgt_mask=None, **kwargs)

forward(tokens, *args, memory, memory_key_padding_mask=None, memory_mask=None, tgt_mask=None, **kwargs)

global_token_hook(tokens, *args, **kwargs)

score_embeddings(embeddings)

Transformers (`depthcharge.transformers`)

`SpectrumTransformerEncoder(d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0.0, peak_encoder=True)`

`d_model: int` `property`

`device: torch.device` `property`

`dim_feedforward: int` `property`

`dropout: float` `property`

`n_layers: int` `property`

`nhead: int` `property`

`forward(mz_array, intensity_array, *args, mask=None, **kwargs)`

`global_token_hook(mz_array, intensity_array, *args, **kwargs)`

`AnalyteTransformerEncoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, padding_int=None)`

`d_model: int` `property`

`device: torch.device` `property`

`dim_feedforward: int` `property`

`dropout: float` `property`

`n_layers: int` `property`

`nhead: int` `property`

`forward(tokens, *args, mask=None, **kwargs)`

`global_token_hook(tokens, *args, **kwargs)`

`AnalyteTransformerDecoder(n_tokens, d_model=128, nhead=8, dim_feedforward=1024, n_layers=1, dropout=0, positional_encoder=True, padding_int=None)`

`d_model: int` `property`

`device: torch.device` `property`

`dim_feedforward: int` `property`

`dropout: float` `property`

`n_layers: int` `property`

`nhead: int` `property`

`embed(tokens, *args, memory, memory_key_padding_mask=None, memory_mask=None, tgt_mask=None, **kwargs)`

`forward(tokens, *args, memory, memory_key_padding_mask=None, memory_mask=None, tgt_mask=None, **kwargs)`

`global_token_hook(tokens, *args, **kwargs)`

`score_embeddings(embeddings)`